Data Architecture
Status: Canonical (Draft) Last Updated: 2026-02-06 Owner: Engineering
Purpose
This document defines the data architecture for CAIRL.
It describes:
- How data is structured and classified
- Where data lives (Postgres vs S3)
- Ownership and access boundaries
- How compliance-regulated data is isolated
- How data architecture supports RLS, retention, and abuse controls
This document is descriptive. All enforcement rules are defined in contracts.
Core Principles
- Postgres is the system of record.
- Ownership is explicit at the row level.
- Compliance-regulated data is isolated and auditable.
- Object storage is never authoritative alone.
- Data architecture must support deletion, retention, and access logging.
Primary Data Stores
Supabase Postgres
Supabase Postgres is the authoritative datastore for:
- User identities and profiles
- Application state
- Metadata for all uploaded objects
- Compliance and audit records
- Billing and partner events
All critical data MUST be represented in Postgres, even if large payloads are stored elsewhere.
AWS S3
AWS S3 is used for:
- Large binary objects
- Uploaded documents
- HIPAA-regulated files
- Derived assets (previews, exports)
Rules:
- S3 objects MUST have a corresponding Postgres record
- Buckets are private by default
- Access is mediated via signed URLs
- S3 lifecycle policies MUST align with retention contracts
S3 is never a source of truth by itself.
Data Classification Model
Every table and object MUST fall into one of the following categories.
User-Owned Data
Definition: Data owned by a specific user.
Characteristics:
- Contains a user_id column
- Subject to RLS
- Deleted on account deletion unless overridden
Examples:
- Messages
- User documents
- Preferences
- Mailboxes
Compliance-Regulated Data
Definition: Data subject to statutory or contractual retention.
Characteristics:
- May survive account deletion
- Access logging required
- Retention windows enforced
Examples:
- HIPAA documents
- Biometric verification records
- Audit logs
- Billing records
System-Owned Data
Definition: Platform-controlled data not owned by a single user.
Characteristics:
- Restricted access
- Admin-only by default
- Not user-deletable
Examples:
- Allowlists
- Suppression lists
- Abuse reports
- Partner configuration
Derived / Ephemeral Data
Definition: Non-authoritative data derived from other sources.
Characteristics:
- Recomputable or discardable
- Time-limited
- Not relied on for correctness
Examples:
- Caches
- Previews
- Temporary exports
- Job state
Table Design Standards
All Postgres tables MUST follow these rules:
- Primary key is UUID
- Ownership columns are explicit (user_id, owner_id)
- created_at and updated_at timestamps are present
- Foreign keys are enforced
- Cascades are intentional and documented
Soft deletes are permitted only if documented and justified.
Ownership and Relationships
- User-owned tables MUST reference users(id)
- Compliance-regulated tables MUST document ownership semantics
- Many-to-many relationships MUST use join tables
- Cross-user access MUST be intentional and auditable
Implicit ownership is prohibited.
HIPAA Data Isolation
HIPAA-regulated data MUST:
- Be clearly labeled at the table and object level
- Be isolated via RLS policies
- Have access logged
- Have retention windows enforced
HIPAA data MAY reside in the same database but MUST be logically isolated.
Biometric Data Handling
Biometric data:
- Is compliance-regulated
- Is stored as objects (S3) with metadata in Postgres
- Uses rolling retention rules
- Is fully deleted on account deletion
Biometric data MUST NOT be reused for non-verification purposes.
Billing and Partner Data
Billing and partner data:
- Is system-owned or compliance-regulated
- Is not user-deletable
- Has explicit retention windows
- Is auditable for disputes
Partner events MUST be deduplicated at the data layer where applicable.
Data Access Patterns
- Client access uses Supabase client with RLS
- Server access uses least-privileged credentials
- Service-role access is limited and logged
- Bulk access requires explicit admin context
No data access pattern may bypass RLS unintentionally.
Data Lifecycle Alignment
Data architecture MUST support:
- Account deletion semantics
- Retention overrides
- Anonymization where permitted
- Automated cleanup jobs
- Audit and compliance review
Schemas that prevent deletion or retention enforcement are invalid.
Observability and Audit Support
Data architecture MUST enable:
- Access logging for regulated data
- Traceability of mutations
- Auditable change history
- Correlation with enforcement actions
Non-Negotiable Rules
- Postgres is authoritative.
- Ownership is explicit.
- S3 never stands alone.
- Compliance data is isolated.
- Architecture supports enforcement.
References
- docs/governance/doc-authority.new.md
- docs/contracts/authz-and-roles.new.md
- docs/contracts/data-retention.new.md
- docs/contracts/rate-limits-and-abuse.new.md
- docs/contracts/rls-standard.new.md
- docs/architecture/observability-plan.new.md
End of Document