The bucket is the database
NamiDB has no external control plane. No Raft cluster. No
ZooKeeper. No DynamoDB lock table. No etcd. The bucket is the
database — every byte of engine state is a plain object in the
S3-compatible store you opened with tg.Client("s3://...").
What lives in the bucket
s3://my-bucket/data/{namespace}/├── manifest.json # CAS root: epoch, current SST list, LSN watermark├── wal/ # Write-ahead log segments│ ├── 0000-0042.wal│ └── 0043-current.wal├── sst/ # Sorted-string tables│ ├── node/L0/... # Parquet node SSTs│ ├── node/L1/...│ ├── edge/L0/... # Custom edge SSTs with CSR adjacency│ └── edge/L1/...└── schema/ # Label & property schemas └── current.jsonThree categories:
- The manifest — a single, tiny JSON object that names every SST currently live for the namespace, plus the epoch, plus the LSN watermark. All writes coordinate through manifest CAS.
- The WAL — append-only segments. Every write is durable as soon
as a
commit_batchcall returns. - SSTs — immutable columnar files. Nodes go to Parquet; edges go to a custom CSR-aware format (RFC-002).
What replaces the consensus tier
S3 conditional writes. Since 2024, S3 honours If-Match /
If-None-Match headers on PutObject. NamiDB writes a new manifest
with If-Match: <previous-etag>; the first writer wins, the rest get
a 412 Precondition Failed and retry.
That single primitive replaces:
| Without conditional writes | With conditional writes |
|---|---|
| External lock service (DynamoDB, ZooKeeper) | Manifest CAS on the object itself |
| Raft / Paxos quorum for the manifest | Conditional PutObject |
| A separate metadata DB | A manifest.json per namespace |
What this buys you
- Durability is whatever S3 already gives you. 99.999999999%, multi-AZ.
- Backups are
aws s3 sync. There is no separate metadata to capture. - Restore is
aws s3 syncin the other direction. - Cost scales to zero when no client opens the namespace. No compute is running. No DynamoDB capacity is reserved.
- Tenants are folders. Each
?ns=...is a sub-tree in the bucket. - Two processes can open the same namespace. The one that wins the manifest CAS at commit time gets to write; the other fences cleanly (epoch increment) and re-reads.
What you give up
- Write throughput per namespace is bounded by one writer at a time. This is a feature for correctness but a ceiling for raw write rate. Sharding by namespace is the answer when you need more.
- Read latency is bounded below by the S3 GET latency for the hot path. Cross-snapshot caches (RFC-018, RFC-019, RFC-020) hide most of it for repeated queries.
- Strong cross-namespace transactions are out of scope. Each namespace is an isolated unit.