I’ve been writing ETL jobs using Kafka for a couple of years now. In that time, I’ve done just about everything wrong, before figuring out what does work. This talk will cover:
-What Kafka is
-What the major frameworks are, and how they steer you towards one-by-one message processing
-Why you shouldn’t do that, including performance measurements for different methods of loading data into a Postgres data warehouse
-How to avoid on-by-one processing