Data Flow Architecture

This section explains how data flows through Temps for analytics, error tracking, session replay, and monitoring.

Analytics Data Flow

Request Analytics Pipeline

Every HTTP request generates an analytics event:

1. Request arrives at Pingora

2. ProxyContext captures metadata
   ├── Method, path, headers
   ├── Client IP, User-Agent
   ├── Visitor ID, Session ID
   └── Timestamps

3. Request routed to upstream

4. Response received
   ├── Status code
   ├── Response time
   ├── Content type
   └── Response headers

5. ProxyLogService creates event
   └── CreateProxyLogRequest struct

6. Event inserted into database
   └── proxy_logs table (TimescaleDB hypertable)

7. Dashboard queries events
   ├── Real-time display
   ├── Historical analysis
   ├── Funnels and paths
   └── Performance metrics

Captured Event Data

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "timestamp": "2024-01-15T10:30:45.123Z",
  "project_id": 5,
  "deployment_id": 42,
  "method": "GET",
  "path": "/api/users",
  "query_string": "page=1&limit=10",
  "status": 200,
  "duration_ms": 45,
  "request_size": 256,
  "response_size": 5120,
  "ip_address": "203.0.113.45",
  "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)",
  "referrer": "https://example.com/dashboard",
  "visitor_id": "v_xyz789",
  "session_id": "s_abc123",
  "content_type": "application/json",
  "cache_status": "MISS",
  "request_headers": {
    "Accept": "application/json",
    "Authorization": "Bearer ..."
  }
}

Error Tracking Data Flow

Error Capture to Storage

1. Error occurs in user app
   ├── Uncaught exception
   ├── API error
   └── Runtime error

2. Error SDK captures
   ├── Stack trace
   ├── Error message
   ├── Context (URL, user, etc.)
   └── Source code line

3. Send to Temps API
   └── POST /api/errors

4. Error Service processes
   ├── Extract error type
   ├── Generate fingerprint
   ├── Group similar errors
   └── Deduplicate

5. Store in database
   └── errors table

6. Trigger notifications
   ├── Email to team
   ├── Slack message
   └── Webhook call

7. Dashboard aggregates
   ├── Error groups
   ├── Trend analysis
   ├── Stack traces
   └── Environment context

Error Event Structure

pub struct ErrorEvent {
    pub project_id: i32,
    pub deployment_id: Option<i32>,
    pub error_type: String,           // e.g., "TypeError"
    pub error_message: String,
    pub stack_trace: String,
    pub fingerprint: String,          // For grouping
    pub environment: String,          // dev, prod, etc.
    pub user_id: Option<String>,
    pub user_email: Option<String>,
    pub context: serde_json::Value,   // Additional context
    pub source_map: Option<String>,   // For JS transpiled code
    pub first_seen: DateTime<Utc>,
    pub last_seen: DateTime<Utc>,
    pub occurrence_count: i64,
}

Session Replay Data Flow

Recording to Playback

1. User visits web app

2. Analytics SDK initializes
   └── Loads session replay recorder

3. Record DOM mutations
   ├── Element additions
   ├── Element removals
   ├── Text changes
   ├── Attribute changes
   └── Style changes

4. Record user interactions
   ├── Clicks
   ├── Form inputs
   ├── Scrolls
   ├── Keyboard events
   └── Touch events

5. Buffer events in memory
   ├── Every 100ms capture snapshot
   ├── Store mutations between snapshots
   └── Keep ~5 minutes of data

6. Periodically send to server
   ├── POST /api/session-replay
   ├── Batch multiple events
   ├── Compress with gzip
   └── Retry on failure

7. Store in database
   └── session_replay_events table

8. Dashboard replays
   ├── Load stored events
   ├── Rebuild DOM state
   ├── Play mutations frame-by-frame
   ├── Sync with console logs
   └── Show user interactions

Session Replay Event

{
  "session_id": "s_abc123",
  "visitor_id": "v_xyz789",
  "project_id": 5,
  "timestamp": "2024-01-15T10:30:45.123Z",
  "type": "mutation",
  "mutations": [
    {
      "type": "added_node",
      "id": 42,
      "parent_id": 41,
      "tag": "div",
      "attributes": {
        "class": "new-element",
        "data-testid": "modal"
      }
    },
    {
      "type": "text_update",
      "id": 43,
      "text": "User clicked button"
    },
    {
      "type": "attribute_update",
      "id": 44,
      "attribute": "aria-hidden",
      "value": "false"
    }
  ]
}

Performance Metrics Data Flow

Monitoring Application Performance

1. User loads web app

2. Browser measures Core Web Vitals
   ├── LCP (Largest Contentful Paint)
   ├── FID (First Input Delay)
   ├── CLS (Cumulative Layout Shift)
   └── FCP (First Contentful Paint)

3. Analytics SDK collects metrics
   ├── Navigation timing
   ├── Resource timing
   ├── Custom metrics
   └── Error rates

4. Send to Temps Analytics API
   ├── POST /api/analytics/metrics
   ├── Batch with other events
   └── Include device/browser info

5. Store in database
   └── performance_metrics table (TimescaleDB)

6. Dashboard displays
   ├── Performance trends
   ├── Device/browser comparison
   ├── Slow page detection
   └── Alerts on degradation

Performance Metric Event

{
  "project_id": 5,
  "session_id": "s_abc123",
  "timestamp": "2024-01-15T10:30:45.123Z",
  "page": "/dashboard",
  "metrics": {
    "lcp": 1250,              // ms
    "fid": 150,               // ms
    "cls": 0.05,              // unitless
    "fcp": 800,               // ms
    "ttfb": 200,              // Time to First Byte (ms)
    "navigation_start": 0,
    "load_event_end": 2000
  },
  "device": {
    "type": "desktop",
    "os": "Windows 10",
    "browser": "Chrome 120"
  }
}

Funnel Analysis Data Flow

Converting Events to Funnels

1. User specifies funnel definition
   ├── Step 1: /pricing (view page)
   ├── Step 2: /signup (view page)
   ├── Step 3: signed_up (custom event)
   └── Step 4: /dashboard (view page)

2. System queries events
   └── Find all sessions matching pattern

3. Analyze conversion
   ├── Step 1: 1000 sessions
   ├── Step 2: 800 sessions (80% conversion)
   ├── Step 3: 600 sessions (75% conversion from step 2)
   └── Step 4: 500 sessions (83% conversion)

4. Calculate dropout
   ├── Lost 200 sessions between step 1 and 2
   ├── Lost 200 sessions between step 2 and 3
   ├── Lost 100 sessions between step 3 and 4
   └── Overall: 50% conversion

5. Display in dashboard
   ├── Visual funnel diagram
   ├── Conversion rates
   ├── Dropout analysis
   ├── Time between steps
   └── Segment by user properties

Visitor Segmentation Data Flow

Building User Cohorts

1. User defines segment
   ├── Browser = Chrome
   ├── Country = USA
   ├── Has signed up
   └── Last seen in last 7 days

2. Query database
   ├── Find visitors matching criteria
   ├── Get their sessions
   ├── Calculate properties
   └── Group by dimension

3. Calculate metrics
   ├── Segment size
   ├── Bounce rate
   ├── Average session duration
   ├── Pages per session
   └── Conversion rate

4. Display results
   ├── Size breakdown
   ├── Comparative metrics
   ├── Trend over time
   └── Behavior analysis

Real-Time Dashboard Updates

WebSocket-Based Updates

1. Client connects to dashboard
   ├── Opens WebSocket connection
   ├── Subscribes to project_id
   └── Subscribes to metric types

2. Proxy logs new event
   ├── Event inserted into database
   ├── Database triggers event
   ├── Temps receives NOTIFY
   └── New event broadcasts to connected clients

3. Updates appear in dashboard
   ├── Page view count updates
   ├── Live visitor count
   ├── Error count badge
   ├── Performance metrics
   └── No page refresh needed

Data Retention

Storage Strategy

Real-time data (24 hours)
  ├── All events stored at full resolution
  ├── Quick access for debugging
  └── High write throughput

Short-term data (7 days)
  ├── Aggregated by hour
  ├── Details available on demand
  └── Moderate compression

Medium-term data (30 days)
  ├── Aggregated by day
  ├── Trends and patterns
  └── Heavy compression

Long-term data (1 year)
  ├── Only summary statistics
  ├── Archived in cold storage
  └── For annual reports

TimescaleDB Hypertables

Temps uses TimescaleDB for time-series optimization:

Features:

  • Automatic chunking - Data split by time intervals (daily chunks)
  • Automatic compression - Old data automatically compressed for storage efficiency
  • Optimized queries - Fast time-range queries with automatic index selection
  • Parallel processing - Large aggregations processed across chunks

Result: Queries on 24 hours of data are instant, even with millions of events. Queries on 1 year of data complete in seconds.

Monitoring the System

System Health Metrics

Database Performance
  ├── Query latency
  ├── Connection pool usage
  ├── Slow query log
  └── Table sizes

Proxy Performance
  ├── Request rate (req/sec)
  ├── Response time (p50, p95, p99)
  ├── Error rate (4xx, 5xx)
  ├── Upstream latency
  └── Certificate cache hits

Application Performance
  ├── Memory usage
  ├── CPU usage
  ├── Goroutine count
  ├── File descriptors
  └── Error rates

Data Privacy

Sensitive Data Handling

PII Protection
  ├── IP addresses captured for geographic info
  ├── User email optional (for notifications)
  ├── Custom data depends on user implementation
  └── Client-side control with analytics SDK

Data Encryption
  ├── Secrets encrypted with AES-GCM
  ├── Password hashes with Argon2
  ├── TLS for all data in transit
  └── At-rest encryption available

Data Retention
  ├── User can configure retention policy
  ├── Automatic deletion after retention period
  ├── Manual deletion of specific events
  └── Data export capability

Next Steps

Was this page helpful?