You are on page 1of 1

Both Flume & Kafka are used for real-time event processing but they are quite

different from each other as per below mentioned points:


1. Kafka is a general purpose publish-subscribe model messaging system. It is not
specifically designed for Hadoop as hadoop ecosystem just acts as one of its
possible consumer.
On the other hand flume is a part of Hadoop ecosystem , which is used for
efficiently collecting, aggregating, and moving large amounts of data from many
different sources to a
centralized data store, such as HDFS or HBase. It is more tightly integrated with
Hadoop ecosystem. Ex, the flume HDFS sink integrates with the HDFS security very
well.
So its common use case is to act as a data pipeline to ingest data into Hadoop.
2. It is very easy to increase the number of consumers in kafka without affecting
its performance & without any downtime.
Also it does not keep any track of messages in the topic delivered to 0consumers.
Although it is the consumer�s responsibility to do the tracking of data through
offset.
Hence it is very scalable contrary to flume as adding more consumers in the flume
means changing the topology of Flume pipeline design, which requires some downtime
also.
3. Kafka is basically working as a pull model. kafka different consumers can pull
data from their respective topic at same time as consumer can process their data in
real-time
as well as batch mode. On the contrary flume supports push model as there may be a
chances of getting data loss if consumer does not recover their data expeditly.
4. Kafka supports both synchronous and asynchronous replication based on your
durability requirement and it uses commodity hard drive. Flume supports both
ephemeral memory-based channel
and durable file-based channel. Even when you use a durable file-based channel, any
event stored in a channel not yet written to a sink will be unavailable until the
agent is recovered.
Moreover, the file-based channel does not replicate event data to a different node.
It totally depends on the durability of the storage it writes upon.
5. For Kafka we need to write our own producer and consumer but in case of flume,
it uses built-in sources and sinks, which can be used out of box.
That�s why if flume agent failure occurs then we lose events in the channel.
6. Kafka always needs to integrate with other event processing framework, that�s
why it does not provide native support for message processing In contrast, Flume
supports different
data flow models and interceptors chaining, which makes event filtering and
transforming very easy. For example, you can filter out messages that you are not
interested in the pipeline
first before sending it through the network for obvious performance reason.
However, It is not suitable for complex event processing.fi

You might also like