Product Analytics Infrastructure
Overview
Product analytics infrastructure is the pipeline that turns user interactions into metrics that drive business decisions. Page views, signups, conversions, feature adoption - these numbers appear on dashboards and inform product strategy, marketing spend, and engineering prioritization.
The problem is that the pipeline between user action and dashboard number is long, lossy, and unreliable. Events are generated in environments you do not control (browsers, mobile apps), transmitted over unreliable networks, processed through multi-stage pipelines, and aggregated into metrics that are presented with false precision.
Building analytics infrastructure that teams can actually trust requires understanding where data loss happens, why attribution breaks, and how to reconcile client-side signals with server-side truth.
Problems
-
Client-side events fail silently. Ad blockers suppress analytics scripts entirely. Network errors drop event payloads. Tab closures kill in-flight requests. Users with flaky connections lose events without any error being logged. The analytics system never knows these events existed.
-
Attribution breaks across sessions and devices. A user clicks a paid ad on their phone, researches on their tablet, and signs up on their laptop. Each session is a separate anonymous user in the analytics system. The conversion is attributed to "direct" traffic on the desktop, while the paid campaign gets no credit.
-
Pipeline lag creates stale dashboards. Events flow through collection endpoints, message queues, processing workers, and warehouse loading jobs. Each stage introduces latency. By the time numbers appear on a dashboard, they may be minutes or hours old. Decisions made on stale data compound errors.
-
Retry and deduplication conflicts. Checkout retries create duplicate events in the analytics layer even when the payment system correctly deduplicates. The funnel shows inflated starts and deflated conversion rates. The numbers are internally consistent but wrong.
Architecture
The architecture treats client-side and server-side events as fundamentally different data sources with different reliability guarantees. Both flow into the same data warehouse, but they feed different dashboards.
Client-side events drive behavioral analytics: page views, click patterns, feature usage, scroll depth. These metrics are useful for UX research and product exploration but are understood to be lossy.
Server-side events drive business metrics: signups, revenue, conversions, churn. These are emitted by the systems of record (auth, order, payment services) and cannot be blocked by ad blockers or lost to network errors.
Key Engineering Challenges
Event Collection Reliability
The collection endpoint is the first point where data can be lost at scale. It must handle burst traffic, validate event schemas, and write to the queue without dropping events under load.
class EventCollector:
def __init__(self, queue, validator, rate_limiter):
self.queue = queue
self.validator = validator
self.rate_limiter = rate_limiter
def collect(self, event):
if not self.rate_limiter.allow(event.source_id):
return Response(status=429)
validation = self.validator.validate(event)
if not validation.ok:
self.metrics.increment("events.invalid",
tags={"reason": validation.error})
return Response(status=400)
self.queue.produce(
topic="raw_events",
key=event.session_id,
value=event.serialize(),
)
self.metrics.increment("events.collected",
tags={"type": event.type})
return Response(status=202)
The collector validates schemas before queuing - invalid events are counted and rejected, not silently dropped. Rate limiting prevents a single misbehaving client from flooding the pipeline. Events are keyed by session ID for ordered processing downstream.
Server-Side Attribution
Moving attribution to the server requires capturing UTM parameters and referrer data at the earliest authenticated touchpoint and persisting it with the user record.
app.post("/api/signup", async (req, res) => {
const attribution = {
source: req.body.utm_source || null,
medium: req.body.utm_medium || null,
campaign: req.body.utm_campaign || null,
referrer: req.headers.referer || null,
landing_page: req.body.landing_page || null,
};
const user = await createUser({
email: req.body.email,
attribution,
});
events.emit("user.signup_completed", {
user_id: user.id,
...attribution,
timestamp: new Date().toISOString(),
});
});
This is not perfect - attribution is still lost for users who visit without UTMs and return later. But the gap is quantifiable. Running daily reconciliation between analytics attribution and server-side attribution produces a measurement error percentage that the team can track and discuss honestly.
Pipeline Reconciliation
The reconciliation query compares analytics-reported signups against the user database. The gap between them is your measurement error.
WITH analytics AS (
SELECT DATE(timestamp) AS day, COUNT(*) AS count
FROM analytics.events
WHERE event = 'signup_completed'
GROUP BY DATE(timestamp)
),
server AS (
SELECT DATE(created_at) AS day, COUNT(*) AS count
FROM users
GROUP BY DATE(created_at)
)
SELECT
s.day,
s.count AS actual,
COALESCE(a.count, 0) AS tracked,
ROUND(100.0 * (s.count - COALESCE(a.count, 0))
/ s.count, 1) AS gap_pct
FROM server s
LEFT JOIN analytics a ON a.day = s.day
ORDER BY s.day DESC;
When this gap is consistently 20-30%, which is common for technical audiences with high ad blocker usage, every metric derived from client-side events is wrong by at least that margin. Publishing this gap as a team metric forces honest conversation about data quality.
Design Tradeoffs
Treating client-side analytics as behavioral signals rather than business truth changes how teams make decisions. It is a harder conversation but produces better outcomes.
Server-side source of truth over client-side analytics. This means accepting that some behavioral data (page views, scroll depth) will be lossy while business metrics (revenue, signups) are exact. Teams accustomed to one dashboard must learn to use two.
Accepting attribution gaps over false precision. Reporting "23% unknown source" is more honest than misattributing those conversions to direct traffic. This changes marketing budget conversations but produces more accurate ROI analysis.
Schema validation at collection over late-stage cleanup. Rejecting malformed events at the collector increases error rates in the short term but prevents corrupted data from propagating through the entire pipeline.
Lessons Learned
The gap between analytics numbers and financial numbers is a system health metric. When it grows, something in the pipeline changed. Monitor it like error rates.
Ad blocker prevalence in technical audiences makes client-side analytics fundamentally unreliable for B2B and developer-focused products. Building strategy around these numbers without server-side validation leads to systematically wrong decisions.
Schema evolution in event pipelines is harder than in APIs. Changing an event schema affects every downstream consumer, and those consumers may be maintained by different teams with different release cycles. Version events from the start.
Reconciliation is infrastructure, not reporting. Building automated reconciliation between data sources catches drift early. Manual reconciliation happens quarterly at best, by which point months of decisions were based on wrong data.
References
- Keystone Real-time Stream Processing Platform- Netflix Technology Blog
- Apache Kafka Documentation- Apache Kafka
- The Log: What every software engineer should know- LinkedIn Engineering
- PostHog: Open-source product analytics- PostHog Blog