Streaming Analytics Layer for a Product That Needed to Surface User Behavior in Near Real Time
Product analytics was running on a daily batch. Product decisions needed to happen faster.
A consumer product company whose product team needed to make decisions based on user behavior at a cadence the existing daily-batch analytics couldn't support. Feature experimentation, anomaly response, and product iteration were all being slowed by the gap between user action and analytical visibility.
The team had instrumented the product extensively, but the events were being collected, batched, and processed once per day. By the time the product team could see how users had reacted to a change deployed in the morning, it was the next day, and any iteration had to wait another day for the next analytical cycle. The product team's iteration rate was being capped by the analytical infrastructure rather than by their decision-making capacity.
A streaming analytics layer that surfaced user behavior data with sub-minute latency to the product team's analytical surfaces.
Event collection was changed from batch ingestion to streaming. Events flowed from the product to a Kafka backbone immediately rather than being collected in client buffers and uploaded periodically. The collection pattern changed minimally on the product side; the difference was on the ingestion side.
A streaming processing layer — built on Apache Flink — performed the aggregations the product team needed to see. Funnel computations, retention windows, conversion metrics, anomaly detection on key product surfaces. Results were materialized into a serving layer that the product team's dashboards queried.
The serving layer was structured for the access patterns the dashboards generated. High-cardinality real-time queries were routed to a query engine sized for that workload; lower-cardinality summary metrics were precomputed and cached. Dashboard query latency stayed under a second across the metrics the team queried most often.
A reconciliation layer compared the streaming results against the existing daily batch outputs. Discrepancies were surfaced and investigated rather than allowed to drift. Over time, the streaming layer's outputs became authoritative and the batch layer was retired for the metrics that had moved to streaming.
Self-service capability was added so the product team could define new metrics against the streaming layer without needing engineering work for each one. Metric definitions became a configuration surface the product team owned.
Product team iteration cadence accelerated noticeably. Decisions that had previously taken multiple days to inform now informed within an hour. The product team's ability to respond to anomalies — feature adoption issues, unexpected user behavior, performance regressions visible in user metrics — moved from reactive to near-real-time. The capability change supported the product team's underlying ambition rather than the previous batch infrastructure capping it.