Data Model

Uniflow uses a multi-table DynamoDB design with 6 dedicated tables. Each entity type has its own table for clarity and independent scaling.

Tables

Table	PK	SK	GSI	Description
`profilesTable`	`userId` (S)	`sortKey` (S)	—	User profile (`META`) and event history (`EVENT#ts#id`). TTL enabled, PITR
`identityTable`	`anonymousId` (S)	—	—	Maps anonymous visitor → known userId
`sourcesTable`	`id` (S)	—	`writeKeyHashIndex` on `writeKeyHash`	Event sources with write-key auth
`destinationsTable`	`id` (S)	—	—	Destination connector configurations
`segmentsTable`	`id` (S)	—	—	Segment definitions (rules)
`segmentMembersTable`	`segmentId` (S)	`userId` (S)	—	User-segment membership records

All tables use PAY_PER_REQUEST billing and RETAIN removal policy.

Access patterns

Profile lookup

Table: profilesTable
Key: { userId: "user_123", sortKey: "META" }
→ Returns merged profile with all traits

User event history

Table: profilesTable
Key: { userId: "user_123", sortKey begins_with "EVENT#" }
→ Returns all events for user, sorted by timestamp

Identity resolution

Table: identityTable
Key: { anonymousId: "abc-123" }
→ Returns the linked userId for an anonymous visitor

Write-key authentication

Table: sourcesTable
GSI: writeKeyHashIndex
Key: { writeKeyHash: "<sha256>" }
→ Returns the source matching a write key

Segment members

Table: segmentMembersTable
Key: { segmentId: "seg_456" }
→ Returns all users in a segment

S3 data lake

Raw events are also stored in S3 via Kinesis Firehose in GZIP-compressed NDJSON format, partitioned by date:

s3://uniflow-events-{account}/
  raw/
    year=2025/
      month=03/
        day=08/
          events-00001.json.gz

This data lake powers the Glue PySpark audience builder for segment evaluation. A Glue Catalog table provides the schema for SQL access. Segment membership results are also written to S3 as Parquet files under s3://{processed-bucket}/segments/{segment_id}/members.parquet.

Billing mode

DynamoDB is provisioned with PAY_PER_REQUEST (on-demand) billing, so you only pay for actual reads and writes. There is no capacity planning required.