IoT Infrastructure on Kubernetes

A cloud-native, event-driven microservices architecture for industrial IoT monitoring, built and deployed on Kubernetes. Course project for Cloud Computing Technologies (A.Y. 2025/2026).

Architecture Overview

The system is organized into three logical namespaces - kong, kafka, and metrics - following a strict Separation of Concerns principle. Data flows from edge sensors through an API Gateway, into a persistent streaming layer, and finally into an optimized time-series database.

The event-driven design fully decouples ingestion from processing: if the consumer slows down or goes offline, messages accumulate durably in Kafka and are processed upon recovery - with zero data loss.

Key Components

Kong API Gateway - Edge Security

Kong acts as the single entry point for all sensor traffic, implementing the Gateway Offloading pattern:

Authentication: API Key validation at the edge - invalid requests are rejected before reaching any backend service
Rate Limiting: 5 requests/second per client, returning 429 Too Many Requests on excess
Load Balancing: Round-robin distribution across Producer replicas

The entire configuration is Infrastructure as Code via Kubernetes CRDs - no manual UI, fully reproducible.

Apache Kafka - Reliable Streaming

Kafka runs in KRaft mode (no ZooKeeper), reducing operational complexity and attack surface. Two dedicated topic strategies handle different QoS requirements:

Topic	Partitions	Compression	Retention	Guarantee
`sensor-telemetry`	3	LZ4	7 days	At-least-once
`sensor-alerts`	2	-	30 days	Zero Data Loss (`min.insync.replicas: 2`)

LZ4 compression reduces JSON network traffic by up to 60% with minimal CPU overhead.

MongoDB - Time Series Storage

Data is stored using MongoDB’s native Time Series Collections, which provide:

Automatic Zstd compression (~70% storage reduction on historical data)
Clustered indexes on timestamp for efficient range queries
TTL auto-expiry at 30 days (expireAfterSeconds: 2592000)

MongoDB is deployed as a StatefulSet to ensure stable pod identity and persistent volume reattachment across crashes.

Python Microservices

Three stateless/stateful services handle the full pipeline:

Producer - Flask REST API; enriches each message with a UUID idempotency token and Kafka topic routing
Consumer - Stateful worker (3 replicas, one per partition); normalizes and persists data to MongoDB
Metrics Service - Exposes 5 REST analytics endpoints (temperature averages by zone, alert breakdown, 7-day activity trend, firmware status)

Non-Functional Properties

Property	Implementation
Security	TLS on port 9093, SASL/SCRAM-SHA-512, Kubernetes Secrets (no hardcoded credentials)
Fault Tolerance	Kafka as async buffer; consumer crash → messages preserved → zero data loss on recovery
Self-Healing	Kubernetes restarts crashed pods automatically
Scalability	HPA configured at 50% CPU threshold; scales Producer 1→4 replicas under load

View on GitHub