We are re-platforming a monolithic application into microservices, on the transition phase the services will be added incrementally and the monolithic app will be retired once all services are in place. To achieve the goal, we had to set up a real-time event processing model for syncing the data store.
Kafka and Flume were the options proposed on architecture council discussions. Kafka is a general purpose pub-sub messaging system which offers strong durability, horizontal scalability and fault-tolerance. While Flume is a distributed system for collecting, aggregating and transfer large data from different sources to a centralized data store. Flume is more tightly integrated with the Hadoop ecosystem whereas Kafka is not, so the obvious choice was Kafka.
While doing POC on Kafka I set out to try Flume as both have overlapping features. Wanted something simple to get hands dirty on Flume, so decided to use Kafka as source of Flume, the sink will be a logger.
1. Create a Flume configuration file that uses Kafka source to send data to logger sink