DBS - Blog

A blog on my experiments and experiences with technology

Processing Streaming Twitter Data using Kafka and Spark — Part 1: Setting Up Kafka Cluster

As someone who loves travelling, finding ways to save money and travel low-cost is essential. There’s little to save while booking flights and I’d rather pay more to traveling comfortably on a 16Hr flight than try to save and wear myself down even before the trip begins! But we can always save a few bucks on accomodation.

Processing Streaming Twitter Data using Kafka and Spark - Part 2: Creating Kafka Twitter stream producer

Architecture

Before we start implementing any component, let’s lay out an architecture or a block diagram which we will try to build throughout this series one-by-one. As our intention is getting to learn more technologies using one use case, this fits just right.

Processing Streaming Twitter Data using Kafka and Spark — Part 1: Setting Up Kafka Cluster

As per the plan I laid out in my previous post, I’ll start by setting up a Kafka Cluster. I’ll primarily be working on Google Cloud instances throughout this series, however, I’ll also lay down steps to setup the same in your local machines as well.