Kafka Role in Microservices Solutions

Apache Kafka is Open Source Streaming platform for streaming and also a distributed log cluster. Kafka makes sense In lots of scenarios like Microservices, Analytics, and Machine Learning, Observability and Logs Injection, Click Stream Analytics and even for the non-traditional messaging platform. 

Kafka: Eventual Consistency, Stream and Beyond

The main story about Kafka could easily be sell out as a messaging solution for eventual consistent microservices. Microservices should have isolation as one of the main properties, which means, they should own they databases and share not DBs which external consumers. Having said that, there is a need for eventual consistency "bus" let's say because you can't just do a single database query and answer any question you need. It's possible that applications do the famous, client-side join, however this could easily become an anti-pattern depending on the data you are reading(Scan multiple DB partitions). 

So there are a couple of solutions that Kafka enables, First of all, you could easily use Kinises todo the same job, however, the main issue would be that you will be coupled to AWS and won't be able to port your data workloads to other clouds if needed.  Having Kafka means that we could use any stream-based solutions like Spark, Flink or even Kafka-Streams in order to process the data in near-real-time or even in batch.

Kafka is great because it allows us to use the same architecture and tools todo Batch and Streaming(Kappa Architecture). So not only you have a solution that works great for your microservices eventual consistency but also you have a solution with is great for analytics.

Kafka could easily be used as a source of truth. Like AWS Aurora does "The Log is the Database", this means applications could store raw events in Kafka and use Kafka features like log compaction in order to remove duplicates and old data(save space). These patterns enable polyglot processing and easily the application could use any solution that makes sense like Redis for Caching, Elastic Search for Full-Text Search, JanusGraph for Graph, Cassandra for TimeSeries and so on and on.

Kafka performance & Scalability

According to LinkedIn benchmarks can easily do 2M writes per second on commodity hardware. A single producer thread could produce +800K records per second all that with median latency around 2ms and p99 with 14ms. Linkedin Numbers from 2019 are supper impressive 7 Trillion messages per day. Kafka's performance does not degrade based on the size of your disk but based on the size of the message, as the bigger the message is, as poor the performance is.

Kafka is not a Database

Via Kafka Streams, you can have "views" of the data. However, Kafka(the log) wants a guarantee data integrity and thats is something the application needs to work on it. Also, there is will a replication lag, especially if we are talking multi-region, however no matter the solution you do, we are talking about physics so you will have replication lag anyway.

Amazon Kinesis Comparison & Multi-region Reliability 

Kafka is super easy to use however is not that easy to operate under scale, especially if there is multi-region involved where you might have issues like replication-lag(Mirror Maker), recently Spotify migrated out Kafka due replication issues using mirror maker and also reliability. Kafka has 3x low latency than AWS Kinesis, However, Kafka could get hard to operate if you using as a source of truth in a multi-region scenario like Spotify describe it. In order to improve fix issues, you could easily use Confluent Cloud or Even AWS MSK which is Kafka managed service.

Kafka: Great for Data Ingestion

Kafka is pretty good for big data because of his ability to ingest big volumes of data, thats a pattern not only for analytics but for other infrastructures solutions like log ingestion(in the observability realm), Click Stream data ingestion(for business analysis, data trends, strategic analysis, and user profile understanding).

Diego Pacheco

Popular posts from this blog

Podman in Linux

Java Agents

Manage Work not People