Row Counts Lie
Column Control Totals Catch The Drift A Row Count Misses
June 30, 2026
The reconciliation check most warehouses ship with is "did the same number of rows land today as yesterday?" It feels safe because it's cheap and it's almost always green. It's also the check that misses the failures that actually hurt: a join that fans out and then dedupes back to the same cardinality, a decimal column that got truncated on the way in, a currency field that came through scaled by 100, a late-arriving correction that swapped one row for another. Same count. Different money.
Row counts answer "how many?" The questions a finance team asks are "how much?" and "is it the same how-much as the system of record?" Those are column questions, and a count cannot answer them. You need a control total: a sum over a meaningful column that you can compare across two loads, two layers, or two systems and have it disagree out loud when the data drifts.
Why a count goes green when the data is wrong
Think about what survives a row-count check. Cardinality is preserved by a surprising number of bad transformations. A LEFT JOIN that should have been an INNER quietly nulls out a measure without changing the count. A SCD load that updates in place instead of inserting keeps the count flat by construction. A unit-of-measure mistake (cents vs dollars, grams vs kilograms) never touches the count at all. None of these trip a row-count alarm, and all of them change the number a stakeholder reads off a report.
The honest version of reconciliation compares the content of a column, not the shape of the table. The cheapest content comparison that scales is a sum. If the sum of OrderAmount in your staging layer equals the sum in your warehouse layer, every dollar that entered is accounted for, even if the row counts differ because of dedup or aggregation. When the sums disagree, you have a real, dollar-denominated discrepancy to chase, not a vague feeling that something looks off.
The catch is that control totals are only useful if they're captured the same way every run, stored where you can query them, and pinned to the exact execution that produced them. A spreadsheet of sums someone runs by hand on Monday is not a control. An audit table the platform writes on every load is.
What the deployed catalog actually records
When you run BimlFlex pipelines, the operational metadata lands in the BimlCatalog database you deploy and own. It's a normal SQL Server database delivered as a DACPAC, sitting in the bfx schema, queryable with the same SQL you use for everything else. Two parts of that schema turn "row counts lie" from a complaint into a control you can enforce.
The first is row-count auditing that carries a column sum alongside the count. Every audited count is tagged to an execution and an object, and it has room for both the count and a control total:
CREATE TABLE [bfx].[RowCount](
[RowCountID] BIGINT IDENTITY (1, 1) NOT NULL,
[ExecutionID] BIGINT NOT NULL,
[ComponentName] NVARCHAR(200) NOT NULL,
[ObjectName] NVARCHAR(200) NOT NULL,
[CountType] NVARCHAR(20) NOT NULL,
[RowCount] INT NOT NULL,
[ColumnSum] DECIMAL(38, 4) NULL,
[ColumnName] NVARCHAR(500) NULL,
[AuditDate] DATETIME DEFAULT (GETDATE()) NULL
);
The shape is the whole argument. RowCount and ColumnSum live side by side, both stamped with the same ExecutionID, ComponentName, ObjectName, and CountType. ColumnName records which column the sum was taken over, so a single object can carry more than one control total (amount, quantity, whatever matters). And the sum is DECIMAL(38, 4): thirty-eight digits of precision so a sum over millions of high-value rows doesn't overflow or round, four decimal places so currency and rates survive intact. That type choice is the difference between a control total you can trust to the cent and one that quietly loses precision at scale.
There's also a covering index over ExecutionID, ComponentName, ObjectName, and CountType, which is exactly the key you'd group on to pull a source-vs-target pair for one run. The table is built to be queried for reconciliation, not just to accumulate.
From "looks off" to a one-line query
Because the count and the control total share an execution and an object, the reconciliation you actually want is a single grouped query against your own catalog. Pull the latest execution, line up the source count type against the target, and let the sums speak:
SELECT
rc.ObjectName,
SUM(CASE WHEN rc.CountType = 'Source' THEN rc.[RowCount] END) AS SourceRows,
SUM(CASE WHEN rc.CountType = 'Target' THEN rc.[RowCount] END) AS TargetRows,
SUM(CASE WHEN rc.CountType = 'Source' THEN rc.ColumnSum END) AS SourceSum,
SUM(CASE WHEN rc.CountType = 'Target' THEN rc.ColumnSum END) AS TargetSum
FROM bfx.[RowCount] rc
WHERE rc.ExecutionID = @ExecutionID
AND rc.ColumnName = 'OrderAmount'
GROUP BY rc.ObjectName
HAVING SUM(CASE WHEN rc.CountType = 'Source' THEN rc.ColumnSum END)
<> SUM(CASE WHEN rc.CountType = 'Target' THEN rc.ColumnSum END);
Any row this returns is a load where the dollars moved even if the rows did not. The two loads can have identical counts and still show up here, because the HAVING clause compares the sums, not the cardinality. That's the whole point: a drifted total is now a queryable failure with a name, an object, and an amount attached, the kind of thing you can wire into an alert or a gate instead of discovering it in a quarter-end review. It pairs naturally with the runtime checks and triage flow we covered in testing and observability as code, and it gives a data contract's validation gate something concrete to assert on: not "the row count is plausible" but "the control total reconciles to the source."
When the sum disagrees, get the cell
A control total tells you a load is wrong. It doesn't tell you which row is wrong. For that, the catalog keeps a second pair of tables that capture the actual audited rows down to the cell. One records the audited row event with its schema, tied to the same execution and object:
CREATE TABLE [bfx].[AuditRow](
[AuditRowID] BIGINT IDENTITY (1, 1) NOT NULL,
[ExecutionID] BIGINT NOT NULL,
[ComponentName] NVARCHAR(200) NOT NULL,
[ObjectName] NVARCHAR(200) NOT NULL,
[AuditType] NVARCHAR(20) NOT NULL,
[RowCount] INT NULL,
[AuditRowSchema] XML NOT NULL,
[AuditDate] DATETIME DEFAULT (GETDATE()) NULL
);
The companion table breaks each captured row into one entry per column value, keyed by row and column name:
CREATE TABLE [bfx].[AuditRowData](
[AuditRowID] BIGINT NOT NULL,
[RowID] INT NOT NULL,
[ColumnName] NVARCHAR(128) NOT NULL,
[ColumnValue] NVARCHAR(4000) NULL
);
This is the level a row count can never reach. AuditRowData is cell-level: for a captured row you get the column name and the value it held, so you can reconstruct exactly what came through. The control total flags the execution; the cell-level audit lets you open it up and see the row whose ColumnValue doesn't match the source. Drift detection and root cause live in the same schema, on the same ExecutionID, so there's no second tool to reach for and no gap between "something's wrong" and "here's the row."
You own the evidence
The part that's easy to undersell: this is your database. The catalog deploys into your environment, the bfx tables are yours to query, index, retain, and report on, and the audit history accumulates as a record you control rather than telemetry that lives in someone else's console. The metadata model already maps which columns carry meaning, so wiring a control total to the right column isn't a separate modeling exercise (the same mapping discipline that makes column-level lineage work). And because the catalog is the integration point the pipelines already write to, the reconciliation evidence shows up as a byproduct of running, not as extra instrumentation you have to build, which is most of what the dev diary on integrating with the BimlCatalog is about.
Row counts will keep going green. That's fine; they're a coarse smoke test and they catch the gross failures. But "the same number of rows arrived" was never the claim anyone actually cared about. The claim is "the same amount of money arrived, and it matches the source." A column control total in a table you own is how you make that claim checkable, and a drifted sum is how you find out before someone else does.