You’ve recently joined a fast-growing startup as an intern. Your mentor has tasked you with developing small applications to help the company manage its operations. The startup is seeing a rapid increase in new user registrations, and there’s a need to process these registrations efficiently.
Each new registration is sent to Kafka, and your task is to handle various small operations triggered by these messages.
Apache Kafka is a distributed streaming platform. Think of it as a highly scalable message queue that:
| Term | Definition |
|---|---|
| Broker | A Kafka server that stores and serves messages |
| Cluster | Multiple brokers working together as one system |
| Topic | A named category/feed of messages (like a database table) |
| Partition | A subset of a topic that enables parallel processing |
| Offset | A message’s sequential position within a partition (0, 1, 2, …) |
| Consumer Group | A set of consumers that share the work of reading a topic |
| Producer | An application that writes messages to Kafka |
| Consumer | An application that reads messages from Kafka |
Traditional message queues delete messages after delivery. Kafka retains messages for a configurable period, allowing:
Each team will develop an application that follows a common pattern known as ETL (Extract, Transform, Load):
┌──────────┐ new_users ┌─────────────┐ actions ┌─────────────┐
│ Producer │ ────────────► │ Your App │ ───────────► │ Leaderboard │
└──────────┘ │ (Python) │ └─────────────┘
└─────────────┘
│
│ team_stats (Step 5)
▼
┌─────────────┐
│ Compacted │
│ Topic │
└─────────────┘
These applications can be written in any language, but for this tutorial, support will be provided for the following languages:
Choose one of the following options to set up your development environment:
git clone https://github.com/PierreZ/kafka-tutorial.git
cd kafka-tutorial
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
Dev Containers: Reopen in Container)To connect to Kafka from your application, you can use the following libraries depending on the language you choose:
The producer generates fake user data with some patterns you should know about:
| Field | Distribution | Relevant For |
|---|---|---|
avatar |
~90% robohash URLs, ~10% example.org |
Team-13 (invalid avatar) |
name |
~90% generated names, ~10% “John Doe” | Team-14 (suspicious name) |
premium |
50% true, 50% false | Team-7 (VIP users) |
pack |
~90% “small”, ~10% “free” | Team-15 (upgrade free) |
credit |
Range: -20 to +20 | Team-7 (>10), Team-8 (<-15) |
This helps you understand expected match rates - if your filter never matches, double-check your logic!
Your instructor displays a real-time leaderboard that tracks your team’s progress!
| Step | Achievement | Emoji | How to Unlock |
|---|---|---|---|
| 1 | Connected | 🔌 | Consumer group becomes active |
| 3 | First Load | 📤 | Produce first valid action message |
| 4 | Scaled | ⚖️ | Have 2+ consumers in your group |
| 5 | Stats Published | 📊 | Produce first stats message with key |
Step 2 (Transform) has no achievement - your filter is verified when 📤 unlocks.
Now that you have the context, you’re ready to dive into the next step! Continue on to Step 1 to get started.