In this step, we’ll explore how to scale your application by running multiple instances of your program. Kafka’s Consumer Groups enable load balancing and fault tolerance by distributing the processing of messages across multiple consumers within the same group.
A Consumer Group is a set of consumers that work together to consume messages from a topic. The key insight is that each partition is assigned to exactly one consumer within a group—no two consumers in the same group ever read from the same partition.
Think of it like a team splitting up work: if you have 3 partitions and 2 team members, one person handles 2 partitions while the other handles 1.
Your consumer’s group_id (e.g., "team-1") identifies which Consumer Group it belongs to. All consumers with the same group_id are part of the same group and share the workload.
Topic: new_users (5 partitions)
┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
│ P0 │ │ P1 │ │ P2 │ │ P3 │ │ P4 │
└───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘
│ │ │ │ │
┌───────────┼────────┼────────┼────────┼────────┼─────────────┐
│ ▼ ▼ ▼ ▼ ▼ │
│ Consumer Group: team-1 │
│ │
│ With 1 consumer: │
│ ┌───────────────────────────────────────────┐ │
│ │ Consumer A (your laptop) │ │
│ │ reads P0, P1, P2, P3, P4 │ │
│ └───────────────────────────────────────────┘ │
│ │
│ With 2 consumers (after colleague joins): │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ Consumer A │ │ Consumer B │ │
│ │ reads P0, P1, P2 │ │ reads P3, P4 │ │
│ └─────────────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Rebalancing is the process where Kafka redistributes partitions among consumers. It happens when:
During a rebalance:
This typically takes a few seconds—you might notice a brief pause in message processing.
Kafka tracks the committed offset for each partition within each Consumer Group. This is how Kafka knows where each group left off, even if consumers restart.
Consumer Group: team-1
┌─────────────┬──────────────────┐
│ Partition │ Committed Offset │
├─────────────┼──────────────────┤
│ P0 │ 142 │ ← Last processed message
│ P1 │ 98 │
│ P2 │ 201 │
│ P3 │ 67 │
│ P4 │ 189 │
└─────────────┴──────────────────┘
When a consumer (re)starts, it asks: “What was the last offset I committed for each partition?” and resumes from there.
Kafka stores committed offsets in a special internal topic called __consumer_offsets. This is a compacted topic (see Step 5) that keeps the latest offset for each (group_id, topic, partition) combination.
How offset commits work:
kafka-python auto-commits every 5 seconds__consumer_offsets__consumer_offsets to find where it left offWhat this means for your application:
Note: You can disable auto-commit and manually commit offsets for finer control, but that’s beyond this tutorial’s scope.
You cannot have more active consumers than partitions. If you have 3 partitions and 4 consumers, one consumer will sit idle with nothing to read.
| Consumers | Partition Assignment | Notes |
|---|---|---|
| 1 | A: P0, P1, P2, P3, P4 | One consumer handles everything |
| 2 | A: P0, P1, P2 B: P3, P4 | Work is split (3+2) |
| 3 | A: P0, P1 B: P2, P3 C: P4 | Better distribution (2+2+1) |
| 5 | A: P0 B: P1 C: P2 D: P3 E: P4 | Maximum parallelism |
| 6 | A: P0 B: P1 C: P2 D: P3 E: P4 F: (idle) | Extra consumer is wasted! |
This is why partition count is an important decision when creating topics.
Ask the instructor to increase the message production rate to simulate higher traffic.
| Leaderboard Shows | Meaning |
|---|---|
| ⚖️ in Progress | Success! 2+ consumers active |
| Only 🔌 📤 | Only one consumer running |
Congratulations, you’ve learned how to distribute and scale your Kafka-based program using Consumer Groups!
You can now head to step 5 to learn about stateful processing!