Monitoring & Observability

Set up comprehensive monitoring, logging, and alerting for production deployments.

Key Metrics to Monitor

Application

  • • Request rate
  • • Response time
  • • Error rate
  • • Queue length

System

  • • CPU usage
  • • Memory usage
  • • Disk I/O
  • • Network I/O

Database

  • • Query performance
  • • Connection pool
  • • Storage usage
  • • Replication lag

Business

  • • API usage
  • • Token consumption
  • • Cost per request
  • • Cache hit rate

Prometheus Setup

prometheus.ymlyaml
scrape_configs:
  - job_name: 'cognitivex'
    static_configs:
      - targets: ['localhost:3000']
    metrics_path: '/metrics'
    scrape_interval: 15s

Grafana Dashboards

Import pre-built Grafana dashboards for CognitiveX monitoring:

  • • Application Performance Dashboard
  • • System Resources Dashboard
  • • Database Metrics Dashboard
  • • Business Metrics Dashboard

Centralized Logging

docker-compose.logging.ymlyaml
version: '3.8'
services:
  loki:
    image: grafana/loki:latest
    ports:
      - "3100:3100"
      
  promtail:
    image: grafana/promtail:latest
    volumes:
      - /var/log:/var/log
      - ./promtail-config.yml:/etc/promtail/config.yml

Alerting Rules

yaml
groups:
  - name: cognitivex
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status="500"}[5m]) > 0.05
        for: 5m
        annotations:
          summary: "High error rate detected"
          
      - alert: HighMemoryUsage
        expr: memory_usage_percent > 90
        for: 5m
        annotations:
          summary: "Memory usage above 90%"

Best Practices

  • • Set up alerts for critical metrics
  • • Use structured logging (JSON format)
  • • Implement distributed tracing
  • • Monitor both technical and business metrics
  • • Set up dashboards for different teams
  • • Regular review of metrics and alerts
  • • Document runbooks for common issues