Skip to content

The bucket is the database

NamiDB has no external control plane. No Raft cluster. No ZooKeeper. No DynamoDB lock table. No etcd. The bucket is the database — every byte of engine state is a plain object in the S3-compatible store you opened with tg.Client("s3://...").

What lives in the bucket

s3://my-bucket/data/{namespace}/
├── manifest.json # CAS root: epoch, current SST list, LSN watermark
├── wal/ # Write-ahead log segments
│ ├── 0000-0042.wal
│ └── 0043-current.wal
├── sst/ # Sorted-string tables
│ ├── node/L0/... # Parquet node SSTs
│ ├── node/L1/...
│ ├── edge/L0/... # Custom edge SSTs with CSR adjacency
│ └── edge/L1/...
└── schema/ # Label & property schemas
└── current.json

Three categories:

  1. The manifest — a single, tiny JSON object that names every SST currently live for the namespace, plus the epoch, plus the LSN watermark. All writes coordinate through manifest CAS.
  2. The WAL — append-only segments. Every write is durable as soon as a commit_batch call returns.
  3. SSTs — immutable columnar files. Nodes go to Parquet; edges go to a custom CSR-aware format (RFC-002).

What replaces the consensus tier

S3 conditional writes. Since 2024, S3 honours If-Match / If-None-Match headers on PutObject. NamiDB writes a new manifest with If-Match: <previous-etag>; the first writer wins, the rest get a 412 Precondition Failed and retry.

That single primitive replaces:

Without conditional writesWith conditional writes
External lock service (DynamoDB, ZooKeeper)Manifest CAS on the object itself
Raft / Paxos quorum for the manifestConditional PutObject
A separate metadata DBA manifest.json per namespace

What this buys you

  • Durability is whatever S3 already gives you. 99.999999999%, multi-AZ.
  • Backups are aws s3 sync. There is no separate metadata to capture.
  • Restore is aws s3 sync in the other direction.
  • Cost scales to zero when no client opens the namespace. No compute is running. No DynamoDB capacity is reserved.
  • Tenants are folders. Each ?ns=... is a sub-tree in the bucket.
  • Two processes can open the same namespace. The one that wins the manifest CAS at commit time gets to write; the other fences cleanly (epoch increment) and re-reads.

What you give up

  • Write throughput per namespace is bounded by one writer at a time. This is a feature for correctness but a ceiling for raw write rate. Sharding by namespace is the answer when you need more.
  • Read latency is bounded below by the S3 GET latency for the hot path. Cross-snapshot caches (RFC-018, RFC-019, RFC-020) hide most of it for repeated queries.
  • Strong cross-namespace transactions are out of scope. Each namespace is an isolated unit.

See also