Flume Introduction#

What is Flume?#

Apache Flume is a tool/service/data ingestion mechanism for collecting aggregating and transporting large amounts of streaming data such as log files, events (etc…) from various sources to a centralized data store.

Flume is a highly reliable, distributed, and configurable tool. It is principally designed to copy streaming data (log data) from various web servers to HDFS.

apache_flume-min

Architecture#

flume_architecture-min

Flume Event#

flume_event-min

Flume Agent#

flume_agent1-min

Data Flow#

flume_dataflow-min

References#