Data Architecture

Status: Canonical (Draft) Last Updated: 2026-02-06 Owner: Engineering


Purpose

This document defines the data architecture for CAIRL.

It describes:

  • How data is structured and classified
  • Where data lives (Postgres vs S3)
  • Ownership and access boundaries
  • How compliance-regulated data is isolated
  • How data architecture supports RLS, retention, and abuse controls

This document is descriptive. All enforcement rules are defined in contracts.


Core Principles

  1. Postgres is the system of record.
  2. Ownership is explicit at the row level.
  3. Compliance-regulated data is isolated and auditable.
  4. Object storage is never authoritative alone.
  5. Data architecture must support deletion, retention, and access logging.

Primary Data Stores

Supabase Postgres

Supabase Postgres is the authoritative datastore for:

  • User identities and profiles
  • Application state
  • Metadata for all uploaded objects
  • Compliance and audit records
  • Billing and partner events

All critical data MUST be represented in Postgres, even if large payloads are stored elsewhere.


AWS S3

AWS S3 is used for:

  • Large binary objects
  • Uploaded documents
  • HIPAA-regulated files
  • Derived assets (previews, exports)

Rules:

  • S3 objects MUST have a corresponding Postgres record
  • Buckets are private by default
  • Access is mediated via signed URLs
  • S3 lifecycle policies MUST align with retention contracts

S3 is never a source of truth by itself.


Data Classification Model

Every table and object MUST fall into one of the following categories.

User-Owned Data

Definition: Data owned by a specific user.

Characteristics:

  • Contains a user_id column
  • Subject to RLS
  • Deleted on account deletion unless overridden

Examples:

  • Messages
  • User documents
  • Preferences
  • Mailboxes

Compliance-Regulated Data

Definition: Data subject to statutory or contractual retention.

Characteristics:

  • May survive account deletion
  • Access logging required
  • Retention windows enforced

Examples:

  • HIPAA documents
  • Biometric verification records
  • Audit logs
  • Billing records

System-Owned Data

Definition: Platform-controlled data not owned by a single user.

Characteristics:

  • Restricted access
  • Admin-only by default
  • Not user-deletable

Examples:

  • Allowlists
  • Suppression lists
  • Abuse reports
  • Partner configuration

Derived / Ephemeral Data

Definition: Non-authoritative data derived from other sources.

Characteristics:

  • Recomputable or discardable
  • Time-limited
  • Not relied on for correctness

Examples:

  • Caches
  • Previews
  • Temporary exports
  • Job state

Table Design Standards

All Postgres tables MUST follow these rules:

  • Primary key is UUID
  • Ownership columns are explicit (user_id, owner_id)
  • created_at and updated_at timestamps are present
  • Foreign keys are enforced
  • Cascades are intentional and documented

Soft deletes are permitted only if documented and justified.


Ownership and Relationships

  • User-owned tables MUST reference users(id)
  • Compliance-regulated tables MUST document ownership semantics
  • Many-to-many relationships MUST use join tables
  • Cross-user access MUST be intentional and auditable

Implicit ownership is prohibited.


HIPAA Data Isolation

HIPAA-regulated data MUST:

  • Be clearly labeled at the table and object level
  • Be isolated via RLS policies
  • Have access logged
  • Have retention windows enforced

HIPAA data MAY reside in the same database but MUST be logically isolated.


Biometric Data Handling

Biometric data:

  • Is compliance-regulated
  • Is stored as objects (S3) with metadata in Postgres
  • Uses rolling retention rules
  • Is fully deleted on account deletion

Biometric data MUST NOT be reused for non-verification purposes.


Billing and Partner Data

Billing and partner data:

  • Is system-owned or compliance-regulated
  • Is not user-deletable
  • Has explicit retention windows
  • Is auditable for disputes

Partner events MUST be deduplicated at the data layer where applicable.


Data Access Patterns

  • Client access uses Supabase client with RLS
  • Server access uses least-privileged credentials
  • Service-role access is limited and logged
  • Bulk access requires explicit admin context

No data access pattern may bypass RLS unintentionally.


Data Lifecycle Alignment

Data architecture MUST support:

  • Account deletion semantics
  • Retention overrides
  • Anonymization where permitted
  • Automated cleanup jobs
  • Audit and compliance review

Schemas that prevent deletion or retention enforcement are invalid.


Observability and Audit Support

Data architecture MUST enable:

  • Access logging for regulated data
  • Traceability of mutations
  • Auditable change history
  • Correlation with enforcement actions

Non-Negotiable Rules

  • Postgres is authoritative.
  • Ownership is explicit.
  • S3 never stands alone.
  • Compliance data is isolated.
  • Architecture supports enforcement.

References

  • docs/governance/doc-authority.new.md
  • docs/contracts/authz-and-roles.new.md
  • docs/contracts/data-retention.new.md
  • docs/contracts/rate-limits-and-abuse.new.md
  • docs/contracts/rls-standard.new.md
  • docs/architecture/observability-plan.new.md

End of Document