Observability Plan

Status: Canonical Last Updated: 2026-02-06 Owner: Engineering


Purpose

This document defines the observability requirements for CAIRL.

It establishes:

  • What MUST be observable
  • What MUST be logged, measured, and alerted on
  • How observability supports security, compliance, and reliability
  • Boundaries between application, database, and provider visibility

This document defines architectural expectations. Implementation details may evolve, but guarantees MUST hold.


Scope

This plan applies to:

  • All application services
  • All server-side code paths
  • All database access patterns
  • All third-party provider integrations
  • All compliance-relevant operations

Client-side observability is explicitly out of scope.


Core Observability Principles (Invariants)

  1. Security- and compliance-relevant actions are always observable.
  2. Sensitive data is never logged.
  3. Observability failures are treated as defects.
  4. Logs are structured, queryable, and retained intentionally.
  5. Access to observability data is restricted.

Observability Pillars

CAIRL observability is structured around three pillars:

  1. Logs – What happened
  2. Metrics – How often and how severe
  3. Traces – How a request flowed through the system

All three pillars MUST exist for critical paths.


Logging Requirements

What MUST Be Logged

The following events MUST be logged server-side:

  • Authentication and session lifecycle events
  • Authorization failures
  • RLS access denials
  • Access to HIPAA-regulated data
  • Admin actions and overrides
  • Rate limit and abuse guardrail triggers
  • Account suspension and restoration
  • Data deletion and retention actions
  • Provider errors and throttling events
  • Background job failures

What MUST NOT Be Logged

The following MUST NOT appear in logs:

  • PHI or HIPAA document contents
  • Biometric image data
  • Authentication secrets or tokens
  • Full request or response payloads from providers
  • Raw user-generated content unless explicitly approved

Logs must prefer identifiers over payloads.


Log Structure

All logs MUST include:

  • Timestamp (UTC)
  • Environment (dev, staging, prod)
  • Request or operation identifier
  • Actor identifier (user_id, admin_id, system)
  • Action type
  • Outcome (success, failure, blocked)
  • Reason code (where applicable)

Logs SHOULD be machine-parseable.


Metrics Requirements

Core Metrics

The system MUST emit metrics for:

  • Request volume and error rates
  • Authorization and RLS denials
  • Rate limit hits and guardrail activations
  • Provider API latency and failure rates
  • Background job success and failure counts
  • Email forwarding caps reached
  • Phone guardrail blocks
  • Partner API usage and deduplication events

Metrics MUST be aggregated and non-identifying.


Compliance Metrics

The following MUST be measurable:

  • HIPAA data access counts
  • Admin access frequency to regulated data
  • Retention job execution and failures
  • Deletion completion rates
  • Audit log growth and retention

Tracing Requirements

Request Tracing

For server-side requests:

  • A unique trace identifier MUST be generated
  • The identifier MUST propagate through:
    • Application logic
    • Database operations
    • Provider calls (where possible)

Tracing MUST support:

  • Latency analysis
  • Failure attribution
  • Dependency mapping

Background and Async Tracing

Background jobs MUST emit:

  • Start and completion events
  • Failure events with reason
  • Correlation to triggering action where applicable

RLS and Database Observability

The following MUST be observable:

  • RLS policy denials
  • Use of service-role credentials
  • Access to compliance-regulated tables
  • Elevated access usage

Database observability MUST NOT expose sensitive row contents.


Provider Observability

For each third-party provider integration:

  • Request counts
  • Error rates
  • Throttling or suppression events
  • Guardrail activations
  • Retry behavior

Provider identifiers SHOULD be logged, not full payloads.


Alerting Requirements

Mandatory Alerts

Alerts MUST exist for:

  • Sustained authorization or RLS failures
  • HIPAA access anomalies
  • Abuse guardrail spikes
  • Provider outages or throttling
  • Background job failures affecting retention or deletion
  • Unexpected growth in regulated data stores

Alerts MUST be actionable and routed to on-call owners.


Alert Hygiene

  • Alerts SHOULD avoid noise
  • Repeated alerts MUST be aggregated
  • Alert thresholds MUST be reviewed periodically

Access Control for Observability Data

  • Observability data is restricted to authorized roles
  • HIPAA-related logs are admin-only
  • Partner-related logs are scoped appropriately
  • Access to logs MUST itself be logged

Retention of Observability Data

Observability data retention MUST comply with data retention contracts.

Minimum expectations:

  • Security and abuse logs: per abuse contract
  • HIPAA access logs: 6 years
  • General application logs: finite, documented window

Incident Support

Observability MUST support:

  • Root cause analysis
  • Audit evidence generation
  • Compliance reporting
  • Partner dispute resolution

Lack of observability is grounds to halt deployments.


Non-Negotiable Rules

  • Compliance visibility is mandatory.
  • Sensitive data is never logged.
  • Observability gaps are defects.
  • Logs are access-controlled.
  • Instrumentation is not optional.

References

  • docs/governance/doc-authority.new.md
  • docs/contracts/authz-and-roles.new.md
  • docs/contracts/data-retention.new.md
  • docs/contracts/rate-limits-and-abuse.new.md
  • docs/contracts/rls-standard.new.md
  • docs/architecture/system-overview.new.md

End of Document