RFC 013: Parquet predicate pushdown
Mirrored from
docs/rfc/013-parquet-predicate-pushdown.mdin the engine repo. Source of truth lives there. Status: draft Author(s): Matías Fonseca info@namidb.com Builds on: RFC-008 (Logical Plan IR), RFC-010 (cost model), RFC-011 (predicate pushdown), RFC-002 §4 (SST stats) Supersedes: —
Summary
Cosecha los min/max que el writer ya emite por row-group (RFC-002 §4
- read-back desde el Parquet footer) para evitar decodificar row-groups que no pueden contener filas que satisfacen el WHERE.
El predicate pushdown estructural (RFC-011) empujó cada Filter lo más
cerca del leaf (NodeScan) que pudo; este siguiente paso convierte el
Filter inmediatamente sobre NodeScan en predicates del
NodeScan, que el storage layer consume durante el scan para descartar
row-groups completos sin decodificar.
Alcance:
- Tipo
ScanPredicateennamidb-storage::sst::predicates(Eq / Lt / LtEq / Gt / GtEq / Between / IsNull / IsNotNull / In) contra una sola columna y un literal canónico (StatScalar). eval_row_group(predicate, &PropertyColumnStats) -> RowGroupVerdictconservador:Absentsolo si los stats demuestran imposibilidad,MaybePresentcuando los stats faltan o no son concluyentes.NodeSstReader::scan_with_predicates(&[ScanPredicate])lee el Parquet metadata, evalúa cada predicate contra los stats por row-group, skipea cualquier row-group con verdictAbsentpara CUALQUIER predicate.Snapshot::scan_label_with_predicates(label, &[ScanPredicate])ySnapshot::scan_label(label)mantenido como wrapper descan_label_with_predicates(label, &[])(compat).LogicalPlan::NodeScanganapredicates: Vec<ScanPredicate>field. Default vacío para callers existentes.- Rewriter en
optimize::pushdown: cuando llega aNodeScanconpendingno-vacío, intenta convertir cada conjunct a unScanPredicatesobrealias.property. Los pushables van al NodeScan.predicates; los no-pushables permanecen comoFilter. - Executor pasa los predicates al storage en el callsite de
walker::execute_node_scan. - Cardinality evalúa selectividad de los predicates ya empujados
sobre el
node_countdel catalog (consistente con RFC-010 §3.1). - EXPLAIN VERBOSE:
NodeScan label=Person alias=a predicates=[a.age > 30, a.firstName = "Alice"].
Out-of-scope explícito:
- Parquet row-level filtering (Arrow
filteroperator dentro del reader). El executor ya tieneFiltery aplicarlo dos veces duplicaría trabajo. Solo hacemos row-group pruning en storage. - Predicates sobre Expand / edge SST. La RFC-002 sí define stats
por edge SST (
DegreeHistogram) pero las edges no tienen stats por propiedad arbitraria. Se mantiene fuera del v0. - OR predicates. Cada
ScanPredicatees un single-column AND conjunct.WHERE a.x = 1 OR a.y = 2no se empuja en v0 — requeriría unión de row-groups con bookkeeping del verdict por-row. Conjunctive-only. - Predicates cross-alias (
WHERE a.age = b.age). El storage no conoceb; el Filter cross-alias permanece arriba del NodeScan y cuando el HashJoin rewrite ya lo está convirtiendo a HashJoin, esto NO le quita nada. - Predicates derivados de parámetros. Los parámetros se
resuelven en runtime; el storage layer no los ve. Si el lowering
conoce el valor (constante), se baja como literal; si es un
parameter abierto, el Filter permanece arriba. Una posible
extensión post-v0: resolver parameters en
optimize::optimizecuando&Paramsse pase al pipeline. - Page-index pruning (Parquet 2.0 column index + offset index).
Vale para SSTs grandes con row-groups grandes pero requiere otra
layer de stats. Skip v0; el writer ya emite chunk-level stats
(
EnabledStatistics::Chunk) — suficiente. - Bloom filter check sobre eq predicates. El writer emite
bloom filters para
node_idpero no para propiedades arbitrarias. Extender el bloom a properties es un trade-off de espacio (bloom bytes son ~7×rows) — esto queda diferido si las stats min/max no alcanzan. - Cardinality estimate con dependencia entre predicates. Cada predicate aplica selectividad independiente, como RFC-010 §3.2. El catalog HLL daría correlación pero v0 mantiene la asunción de independencia.
Motivation
Sobre LDBC SNB SF1 (3M Person nodes en ~30 SSTs de 100k rows c/u, con
row-groups de 8192 rows = ~366 row-groups por label), una query
MATCH (a:Person) WHERE a.creationDate > '2020-01-01' RETURN a que
hoy:
- Lee TODOS los SSTs (cada uno ~10–40 MB sobre S3).
- Decodifica TODOS los row-groups (~366 Parquet decompressions).
- Filtra row-level en el executor: descarta ~99% de las filas.
Con predicate pushdown:
- Lee TODOS los SSTs footer + page index (~64 KiB c/u, no body).
- Por cada SST consulta
min/max(creationDate)por row-group. - Skip los row-groups cuyo
max(creationDate) < '2020-01-01'. - Decodifica solo los row-groups que pueden contener matches (~10–30 row-groups en lugar de 366).
Ahorro de IO sobre S3 (dominante en cloud): 10× típico para queries selectivas. Ahorro de CPU en decoding (Parquet deserialización): factor ~30×.
Esto es la última pieza del pushdown end-to-end del query layer al
storage. Sin ella, los min/max que el writer emite no se usan: solo
están en el catalog para el cost model.
Design
1. namidb-storage::sst::predicates
/// A single-column conjunctive predicate that the SST reader can/// evaluate against per-row-group stats to skip entire row-groups/// without decoding them.////// Each variant references a column **by its declared property name**/// (not by Parquet leaf path). The reader resolves to leaf index at/// scan time.#[derive(Clone, Debug, PartialEq)]pub enum ScanPredicate { Eq { column: String, value: StatScalar }, Lt { column: String, value: StatScalar }, LtEq { column: String, value: StatScalar }, Gt { column: String, value: StatScalar }, GtEq { column: String, value: StatScalar }, /// `Between { low, high }` is INCLUSIVE both sides. Equivalent to /// `Gte(low) AND Lte(high)`. Between { column: String, low: StatScalar, high: StatScalar }, IsNull { column: String }, IsNotNull { column: String }, In { column: String, values: Vec<StatScalar> },}The literal type — StatScalar — is the same one the writer emits
into PropertyColumnStats. This guarantees comparison between
predicate and stats lives in a single ordering.
Verdict
#[derive(Clone, Copy, Debug, PartialEq, Eq)]pub enum RowGroupVerdict { /// Stats prove no row in this row-group can satisfy the predicate. Absent, /// Stats are insufficient (missing min/max, type mismatch) OR /// stats overlap — rows may or may not match. Decode the /// row-group. MaybePresent,}Conservatism: we only return Absent when the stats prove the row-group
contains no match. Missing min/max ⇒ MaybePresent. Type mismatch
(predicate is Utf8 but stats are Int32) ⇒ MaybePresent (defensive;
treats the predicate as inapplicable rather than asserting).
Evaluation
pub fn eval_row_group( predicate: &ScanPredicate, stats: &PropertyColumnStats,) -> RowGroupVerdict;Algorithm per predicate (column known to match stats.name):
Eq(v):- If
min ≤ v ≤ max→ MaybePresent. Else Absent. - If min/max missing → MaybePresent.
Lt(v):- If
min < v→ MaybePresent (some row may be < v). Else Absent (all rows ≥ v). LtEq(v):min ≤ v→ MaybePresent. Else Absent.Gt(v):max > v→ MaybePresent. Else Absent.GtEq(v):max ≥ v→ MaybePresent. Else Absent.Between { low, high }: equivalent toGtEq(low) AND LtEq(high). Apply both; AND of verdicts = Absent if either is Absent.IsNull:null_count > 0→ MaybePresent. Else Absent.IsNotNull: row-group has at least one non-null value (null_count < row_count); we don’t haverow_countper stats here but we DO havenull_countand ifnull_count == 0we’re sure non-nulls exist (defensive: always MaybePresent in v0 when stats exist; falls back to row-level Filter for the per-row check).In { values }: build the closed interval[min(values), max(values)]; apply same logic asBetween. False-positives are accepted (some intermediate value may not be in theInlist — caught by the Filter operator above the NodeScan that the rewriter leaves intact as residual when In is partial).
Comparison between StatScalar variants follows the obvious type
matching: Int32 vs Int32, Float64 vs Float64, Utf8 vs Utf8, etc.
Cross-type comparison (e.g. Int32 vs Float64) returns MaybePresent
in v0; the optimizer doesn’t generate cross-type predicates because the
property type is declared in the schema.
NULL handling: min and max in PropertyColumnStats are computed
over non-null values only (writer convention; see RFC-002 §4.1). So a
column where every value is NULL has min=None, max=None, null_count=N. We evaluate IsNull from null_count alone, and any
ordered predicate (Eq/Lt/Gt/...) on a min/max=None column returns
MaybePresent (defensive — the row-group has no non-null rows that
could satisfy the predicate, but evaluating the predicate at row-level
will correctly drop NULLs via 3VL).
2. NodeSstReader::scan_with_predicates
impl NodeSstReader { pub fn scan_with_predicates( &self, predicates: &[ScanPredicate], ) -> Result<Vec<RecordBatch>>;}Algorithm:
- Build the Parquet reader once:
ParquetRecordBatchReaderBuilder::try_new(body). - Read the metadata; collect the leaf index for every property column referenced by any predicate.
- For each row-group:
- For each predicate, locate the column leaf in the row-group’s
ColumnChunkMetaData →
cc.statistics(). Map to a per-row-groupPropertyColumnStatssynthesizing only the fields evaluation needs (null_count,min,max— same coding ascompute_property_stats). - Evaluate each predicate with
eval_row_group. If ANY returnsAbsent, skip the row-group.
- Collect surviving row-group indices into
keep. - If
keepis empty, returnVec::new()(no decode). - Otherwise build the reader
with_row_groups(keep)and decode as inscan().
Cost: ~few µs per row-group of metadata inspection (the metadata is
already in memory from try_new). When all row-groups survive we
fall through to the same path as scan() and pay no extra IO.
3. Snapshot::scan_label_with_predicates
impl Snapshot<'_> { pub async fn scan_label(&self, label: &str) -> Result<Vec<NodeView>> { self.scan_label_with_predicates(label, &[]).await }
pub async fn scan_label_with_predicates( &self, label: &str, predicates: &[ScanPredicate], ) -> Result<Vec<NodeView>>;}The new variant:
- Iterates the memtable as
scan_labeldoes today, but additionally evaluates each predicate against the materialised NodeView’s property map. Memtable values are decoded already, so this is cheap (no IO). - For each SST scoped to
label, callsreader.scan_with_predicates(predicates)instead ofreader.scan(). - Returns the same
BTreeMap<NodeId, …>semantics: tombstones win; last-write-wins by LSN.
Memtable predicate evaluation is row-by-row in v0 and uses the same
3VL semantics as the executor’s Filter (Bool(true) → keep,
Bool(false) / Null → drop).
4. LogicalPlan::NodeScan change
NodeScan { label: String, alias: String, /// Predicates that have been pushed into the scan from a Filter /// directly above it. Empty for the lowering output; populated /// by `optimize::pushdown` when conjuncts qualify (see §5). /// The executor passes them verbatim to /// `Snapshot::scan_label_with_predicates`. predicates: Vec<ScanPredicate>,}PartialEq, Clone, Debug derive over the new field. All existing
constructions of NodeScan (lowering, tests) now use
predicates: Vec::new().
operator_name() remains "NodeScan". EXPLAIN VERBOSE renders the
predicates inline (see §6).
5. Rewriter in optimize::pushdown
The existing leaf arm:
LogicalPlan::Empty | LogicalPlan::Argument { .. } | LogicalPlan::NodeScan { .. } => { apply_filters(plan, pending)}becomes:
LogicalPlan::NodeScan { label, alias, predicates } => { let (pushable, residual) = classify_pending_for_scan(pending, &alias, &label_def); let mut merged = predicates; merged.extend(pushable); apply_filters( LogicalPlan::NodeScan { label, alias, predicates: merged }, residual, )}classify_pending_for_scan(pending, alias, label_def) returns:
pushable: Vec<ScanPredicate>— conjuncts that are single-column comparisons onalias.<property>with a literal/parameter (only literals in v0; parameters deferred) and reference NO other alias. The property must be declared inlabel_def.residual: Vec<Expression>— everything else; stays asFilterabove the NodeScan.
The classification function lives in
optimize::parquet_pushdown::classify (new module) and is unit
tested independently. The integration into pushdown_at is a single
arm change.
Why fold the parquet pushdown into the same pushdown_at pass instead
of a separate post-pass: the pending accumulator already carries
every conjunct that the existing pushdown was about to materialise as
Filter over NodeScan. Classifying them at the leaf is the natural
place — it costs O(|pending|) per leaf and avoids a second tree walk.
6. EXPLAIN VERBOSE
Project [a] (est=1500) NodeScan label=Person alias=a predicates=[a.age > 30, a.firstName = "Alice"] (est=1500)Plain EXPLAIN (no VERBOSE) also renders the predicates — they are
part of the operator shape, not annotations. EXPLAIN RAW shows the
pre-optimize lowering, where NodeScan has predicates: vec![] and
the conjuncts live in a Filter above.
predicates=[...] rendering uses each predicate’s Display:
Eq { column, value }→a.col = <lit>Lt { column, value }→a.col < <lit>Between { column, low, high }→a.col BETWEEN <low> AND <high>IsNull { column }→a.col IS NULLIsNotNull { column }→a.col IS NOT NULLIn { column, values }→a.col IN [<v1>, <v2>, ...]
The alias.col prefix comes from the NodeScan’s alias. Literals
render via their StatScalar Display.
7. Cardinality
The NodeScan arm in cost::cardinality::estimate_inner becomes:
LogicalPlan::NodeScan { label, alias, predicates } => { let base = catalog.label(label).map(|l| l.node_count as f64).unwrap_or(0.0); let sel = predicates_selectivity(predicates, catalog, label); let rows = base * sel; // bindings, leaf as before}predicates_selectivity reuses the existing selectivity machinery
from cost::selectivity by translating each ScanPredicate to its
Expression analogue (Eq → BinaryOp::Eq, etc.) and calling
selectivity(&expr, &binding_stats) where binding_stats is
seeded from the property stats in the catalog. Multi-predicate
combines under the independence assumption (RFC-010 §3.2).
Trade-off: we double-evaluate selectivity (once at NodeScan level for the pushed predicates, once at any residual Filter above). This is correct — Filter applies on top of the already-reduced NodeScan estimate.
8. Edge SSTs
Out of scope (see §“Out-of-scope”). EdgesFwd/Inv SSTs ship
DegreeHistogram but no per-property stats. Adding edge-property
stats requires the writer to track them per edge_type — a separate
RFC if/when needed.
Alternatives considered
A. Filter pushdown using DataFusion’s Expr
DataFusion has a full Expr language with a PhysicalPlanner that
converts pushable Expr to Parquet RowFilter. Rejected: bringing
DataFusion as a dependency for the pushdown ergonomics alone is a
mismatch — we’d still need a translator from our Expression to
their Expr, and the rest of our executor wouldn’t share the path.
A future morsel/vectorized iteration may revisit, but pushdown alone
doesn’t pay it.
B. Runtime adaptive sampling
Detect that a query is selective by sampling the first N rows and deciding pushdown on the fly. Rejected: defeats EXPLAIN/PROFILE story (plan changes at runtime), and our static stats are good enough for the v0 regime.
C. Encode predicates in a server-side filter pushdown to S3 Select
S3 Select supports SQL filters server-side but requires the body to be in CSV/JSON. Parquet Select is not GA. Rejected: incompatible with our storage format. Future feasibility check goes with edge storage RFCs.
D. Build a custom bloom filter per property column for eq pushdown
The writer would emit a bloom over each property’s hashed values. Eq
predicates probe the bloom before reading the row-group. Deferred:
RFC-002 explicitly limits blooms to node_id (for point lookups). A
property bloom is ~7×rows bytes — for 100k row SSTs that’s ~700 KiB
per property column. The space cost only pays off on cardinalities
that min/max-based pruning misses (which is rare — eq on high-NDV
columns is already covered by min/max for the values inside a row-group
and a NodeId-bloom alike). Track for follow-up if real workloads show it.
Drawbacks
-
No row-level filter in storage. Surviving row-groups still decode in full; the executor’s Filter then drops non-matching rows. For row-groups with ~50% selectivity this double-touches values. Mitigated by the executor’s Filter living in the same process — it’s cheap. Row-level pushdown in storage would couple Arrow’s
filteroperator to the reader (morsel direction). -
Single-column predicates only. Multi-column predicates (
a.x + a.y > 100) stay asFilter. v0 accepted. -
No parameter substitution in v0.
WHERE a.age > $minAgekeeps the Filter above the NodeScan since we don’t resolve$minAgeuntil execution. A later optimization passes&Paramsintooptimize::optimizeto constant-fold them; deferred. -
OR predicates are not pushed.
WHERE a.age > 30 OR a.firstName = "Alice"is one conjunct in the AND-split (the OR root), and pushability requires single-column ⇒ rejected. The Filter survives. Could be added by extendingScanPredicateto a tree, but row-group verdict combination for OR gets messier (union of MaybePresent verdicts). -
IsNotNullis conservative. We don’t have per-row-group row_count to verifynull_count < row_count. Always returns MaybePresent when stats exist; the executor’s Filter drops nulls at row level. Negligible cost. -
Cross-type predicate comparison returns MaybePresent. By design (defensive). The optimizer doesn’t construct cross-type predicates because schemas declare property types — but if a future path introduces them, this is the safety net.
-
Memtable predicate eval is row-by-row (not vectorised). The memtable is typically small (<10k rows before flush), so this is negligible vs SST decoding. Morsel-driven execution can revisit.
Open questions
-
OQ1.
predicates_selectivityreuse path: should it translateScanPredicate→Expressionand callselectivity, or have its own simpler arm? Decided: translate (single source of truth for selectivity heuristics). -
OQ2. Should
NodeByIdalso accept predicates? TodayNodeByIdis a point-lookup. Predicates on the same alias COULD be applied during the lookup. v0: no — the lookup decodes one row group with at most one row anyway; Filter on top is fine. Track as follow-up if benchmarks show a hot path. -
OQ3. Should the writer emit per-row-group HLL sketches (not just per-SST)? This would enable approx-NDV reasoning per row-group for eq pushdown. Deferred; the current per-SST HLL is sufficient for query-level cardinality estimates.
References
- Parquet 2.0 column index + offset index — Apache Parquet specification §6.2 (column index for page-level pruning).
- DuckDB’s predicate pushdown into Parquet readers (Raasveldt 2022) — modern reference implementation.
docs/rfc/002-sst-format.md§4 — stats embedded in SST.docs/rfc/008-logical-plan-ir.md— IR this RFC extends.docs/rfc/010-cost-based-optimizer.md— cost model this RFC reuses.docs/rfc/011-predicate-pushdown.md— the structural pushdown this RFC builds on.
Plan de implementación
crates/namidb-storage/src/sst/predicates.rs(~250 LoC + 18 tests):
ScanPredicateenum +RowGroupVerdict+eval_row_groupevaluator. Helpersscalar_cmp(a, b) -> Orderingandscalar_eq(a, b) -> bool(delegating to PartialOrd / PartialEq ofStatScalar).- Module
pubinsst/mod.rs. - Unit tests cubren: Eq in/out range, Lt boundary, GtEq with NULL min, IsNull positive/negative, In with single/multi values, Between, missing min/max → MaybePresent, type mismatch → MaybePresent.
crates/namidb-storage/src/sst/nodes.rs(~150 LoC + 5 tests):
NodeSstReader::scan_with_predicates(&[ScanPredicate])implementing §2 algorithm.scan()becomes a wrapper ofscan_with_predicates(&[]).- Helper
row_group_stats_for_column(rg, col_name, prop_def) -> Option<PropertyColumnStats>reusing the mapping fromcompute_property_stats. - Unit tests: predicate skips all row-groups, predicate skips
some, no predicates fall through to full scan,
IsNullwith NULL row-group survives, multi-predicate AND, no-stats fallback keeps row-group.
crates/namidb-storage/src/read.rs(~80 LoC + 3 tests):
Snapshot::scan_label_with_predicates(label, &[ScanPredicate])implementing §3 algorithm.scan_label(label)wraps it with&[].- Memtable predicate eval helper using
node_view_matches_predicates(NULL-safe 3VL). - Unit tests: memtable filtering, SST filtering, predicate over tombstoned row, ND ndv (just kidding — verifies catalog isn’t used in scan path).
crates/namidb-storage/src/sst/mod.rs+lib.rs:
pub mod predicates+ re-exportScanPredicate,RowGroupVerdict,eval_row_group.
crates/namidb-query/src/plan/logical.rs(~30 LoC + 1 test):
LogicalPlan::NodeScanaddspredicates: Vec<ScanPredicate>.- Type alias
pub use namidb_storage::sst::predicates::ScanPredicateat module root so the rest of the query crate doesn’t need to know the storage path. - Updates to
children()(no children added — predicates are flat),operator_name()(still “NodeScan”),contains_write()(still false). - Test ensures NodeScan with predicates equals NodeScan with same predicates and not equal when predicates differ.
crates/namidb-query/src/optimize/parquet_pushdown.rs(~250 LoC + 14 tests):
classify_pending_for_scan(pending: Vec<Expression>, alias: &str, label_def: &LabelDef) -> (Vec<ScanPredicate>, Vec<Expression>).- Conversion
try_into_scan_predicate(expr, alias) -> Option<ScanPredicate>case-analysing eachExpression::kind. Supports: BinaryOp {Eq/Lt/LtEq/Gt/GtEq} withPropertyAccess(alias, prop)on one side andLiteral(lit)on the other; the literal converts toStatScalarvia a helper.IS NULL / IS NOT NULL.IN [list]when every element is a literal.BETWEENdecomposes to Gte+Lte at lowering time so the AND-split already gives us two conjuncts. - Tests: eq pushable, eq with non-matching alias rejected, eq with literal-on-left, range pushable, IS NULL pushable, IS NOT NULL pushable, IN with all literals, IN with non-literal rejected, cross-alias rejected, non-declared property rejected, parameter rejected, complex arithmetic rejected, idempotency.
crates/namidb-query/src/optimize/pushdown.rs(~30 LoC + 4 tests):
- NodeScan arm in
pushdown_atnow consultsparquet_pushdown::classify_pending_for_scan. The non-pushable conjuncts materialise asFilterabove; the pushable accumulate intopredicates. - Tests verify: filter eq on declared prop ends up in NodeScan predicates; filter on parameter stays as Filter; filter on undeclared prop stays as Filter; filter on different alias stays as Filter (and would have been pushed elsewhere by the CrossProduct arm).
crates/namidb-query/src/optimize/mod.rs(~10 LoC):
pub mod parquet_pushdown+ re-exportclassify_pending_for_scanso tests can reach it.- The
optimizepipeline doesn’t add a new pass; the NodeScan arm change inpushdown_atcovers it.
crates/namidb-query/src/optimize/normalize.rs(~5 LoC):
recurse_childrenarm for NodeScan recurses on… nothing (NodeScan is a leaf). The change is to preservepredicateswhen the arm clones the variant — trivial.
crates/namidb-query/src/exec/walker.rs(~10 LoC):
execute_node_scancallsite (line ~140) passespredicatestosnapshot.scan_label_with_predicates(label, predicates).await?.
crates/namidb-query/src/cost/cardinality.rs(~40 LoC + 3 tests):
- NodeScan arm applies multiplicative selectivity over predicates
using the existing
selectivity::selectivityandBindingStatsmachinery. - Tests: NodeScan with eq predicate estimate drops below base; NodeScan with range predicate estimate proportional to range; NodeScan with empty predicates equals base.
crates/namidb-query/src/plan/explain.rs(~50 LoC + 2 tests):
write_headerarm for NodeScan with predicates renders as §6. Predicate Display usesformat_scan_predicate(p, alias)helper, also unit-tested.
crates/namidb-query/src/plan/lower.rs(~10 LoC):
- Lowering creates
NodeScan { predicates: vec![] }. Mecánica; sin test new.
crates/namidb-query/tests/cost_smoke.rs(+8 integration tests):
parquet_pushdown_moves_eq_to_scanparquet_pushdown_moves_range_to_scanparquet_pushdown_keeps_cross_alias_in_filterparquet_pushdown_keeps_undeclared_property_in_filterparquet_pushdown_renders_in_explainparquet_pushdown_estimate_drops_below_full_scanparquet_pushdown_executes_with_parity_to_rawparquet_pushdown_skips_all_row_groups_when_out_of_range
Snapshot esperado:
cargo test --workspace --exclude namidb-py: 528 → ~580 passed.cargo clippy --workspace --all-targets --exclude namidb-py -- -D warnings: clean.cargo fmt --all -- --check: clean.- LoC nuevo: ~900 src + ~500 tests + ~650 RFC.