
A single balance column seemed like the obvious choice for a payments API. It took a production incident, two days of manual reconciliation, and a full architecture rewrite to understand why it wasn't. Some engineering mistakes fail immediately. Unit tests break, monitoring alerts fire, and the deployment gets rolled back within minutes. Others are far more expensive. They work perfectly at first, surviving testing, passing code reviews, and even performing well in production. The problem only appears later, when the system needs to answer a question it was never designed to answer. This is the story of one of those mistakes. The system worked perfectly, right up until it needed to answer a question we'd never thought to ask: how did we get here? \ While building a payment API for a fintech product, we made a decision that seemed completely reasonable at the time. We stored account balances directly on the account record and updated them whenever money moved between accounts. The schema looked like this: CREATE TABLE accounts ( id UUID PRIMARY KEY, user_id UUID NOT NULL, balance DECIMAL(19,4) NOT NULL DEFAULT 0, currency VARCHAR(3) NOT NULL, updated_at TIMESTAMP NOT NULL DEFAULT NOW() ); Transfers were straightforward: def transfer(sender_id: str, recipient_id: str, amount: Decimal, db: Session): sender = ( db.query(Account) .filter(Account.id == sender_id) .with_for_update() .first() ) recipient = ( db.query(Account) .filter(Account.id == recipient_id) .with_for_update() .first() ) if sender.balance < amount: raise InsufficientFundsError() sender.balance -= amount recipient.balance += amount db.commit() The implementation was simple, easy to understand, and it performed well. For a while, it appeared to be the correct solution. Why the Design Seemed Reasonable Looking back, the decision wasn't reckless. Reading a balance required a single database query, and updating one required only a transaction with row-level locking. From a performance perspective, it was efficient, and from a product perspective it worked. The problem was not correctness. The problem was auditability. We had accidentally modelled money as a mutable state rather than a sequence of financial events, and that distinction would eventually become very expensive. What We Lost Without Realising It At first, we believed we had history because we also maintained a transactions table. CREATE TABLE transactions ( id UUID PRIMARY KEY, sender_id UUID NOT NULL, recipient_id UUID NOT NULL, amount DECIMAL(19,4) NOT NULL, status VARCHAR(20) NOT NULL, created_at TIMESTAMP NOT NULL DEFAULT NOW() ); On paper, this looked sufficient. It wasn't. The transactions table recorded what we intended to happen, while the accounts table recorded the current state. What we lacked was a complete, auditable record of every balance-changing event, and more importantly, we had no reliable way to reconstruct balances from scratch and verify that the stored balances were correct. That weakness remained invisible until production exposed it. The Incident That Changed Everything Several months after launch, a bug appeared in a batch payout process. A race condition caused a small number of payouts to be processed twice. The transactions table showed both debits, and the balances reflected both debits. Customers were affected. The problem wasn't identifying that something had gone wrong. The problem was determining exactly what had happened: which accounts were impacted, how much money was affected, whether there were additional discrepancies elsewhere, and whether we could reproduce the balance calculations independently. The answer to that last question was no. We spent two days writing reconciliation queries, exporting spreadsheets, and manually validating balances. The system could tell us the current balance, but it could not explain how that balance had been reached. That was the moment we realised our mental model was wrong. The Missing Insight A balance is not the source of truth. A balance is the result of everything that has happened before it. Financial systems are fundamentally different from most CRUD applications. A user's profile name is state. An account balance is history. The balance you see is simply a projection of financial events that occurred over time, and this is a well-established idea in fintech infrastructure: Stripe's own ledger, for instance, models money movement as an immutable log of events rather than as a number that gets overwritten. Once we understood that, the design changed completely. Thinking in Ledgers Instead of State Instead of updating balances directly, every financial event becomes a ledger entry. The ledger becomes the source of truth, and balances become derived values. Our accounts table became almost static: CREATE TABLE accounts ( id UUID PRIMARY KEY, user_id UUID NOT NULL, currency VARCHAR(3) NOT NULL, created_at TIMESTAMP NOT NULL DEFAULT NOW() ); And we introduced a ledger table: CREATE TABLE ledger_entries ( id UUID PRIMARY KEY, account_id UUID NOT NULL REFERENCES accounts(id), amount DECIMAL(19,4) NOT NULL, entry_type VARCHAR(50) NOT NULL, reference_id UUID NOT NULL, created_at TIMESTAMP NOT NULL DEFAULT NOW() ); CREATE INDEX idx_ledger_account_created ON ledger_entries(account_id, created_at); Every debit and credit became a permanent record. Nothing was updated, nothing was deleted, and everything was appended. The shift in mental model is easiest to see side by side: | | Mutable balance (before) | Ledger (after) | |----|----|----| | Source of truth | accounts.balance | ledger entries | | How balance is found | Read directly | Sum of all entries | | Historical balance | Not possible | as of query, always accurate | | Reconciliation | Manual investigation | Single query | | Failure mode | Silent drift | Visible discrepancy | | Corruption recovery | Unrecoverable | Rebuildable from history | Computing Balances from the Ledger Balance calculation became a simple aggregation. def get_balance( account_id: str, db: Session, as_of: datetime = None ) -> Decimal: query = ( db.query(func.sum(LedgerEntry.amount)) .filter(LedgerEntry.account_id == account_id) ) if as_of: query = query.filter( LedgerEntry.created_at <= as_of ) result = query.scalar() return result or Decimal("0") This immediately unlocked something we previously couldn't do. We could ask what an account's balance was six months ago and receive an accurate answer, with no special reporting tables, no snapshots, and no manual reconstruction: balance = get_balance( account_id, db, as_of=datetime(2025, 12, 31) ) Just the ledger. Transfers Became Ledger Entries Instead of updating balances directly, transfers produced matching debit and credit records. def transfer( sender_id: str, recipient_id: str, amount: Decimal, db: Session ): sender_balance = get_balance(sender_id, db) if sender_balance < amount: raise InsufficientFundsError() transfer_ref = uuid4() db.add( LedgerEntry( account_id=sender_id, amount=-amount, entry_type="transfer_debit", reference_id=transfer_ref ) ) db.add( LedgerEntry( account_id=recipient_id, amount=amount, entry_type="transfer_credit", reference_id=transfer_ref ) ) db.commit() return transfer_ref The important detail is that the balance itself never changes. Only the history changes. The balance is always derived. The Idempotency Problem We Also Discovered The payout incident revealed another issue: the root cause was not only the absence of a ledger, but also weak idempotency controls. Financial systems must assume retries will happen, since network failures occur, workers crash, and queues redeliver messages. Without idempotency, the same payment can be processed multiple times. We introduced unique payment references to guarantee that the same financial event could never be recorded twice. ALTER TABLE ledger_entries ADD CONSTRAINT unique_reference UNIQUE(reference_id, entry_type); This solved a different class of problem. The ledger provided visibility, while idempotency prevented duplicate execution. Both were necessary. Reconciliation Became Trivial One of the biggest benefits appeared almost immediately. Reconciliation changed from an investigation to a query. def reconcile_account( account_id: str, expected_balance: Decimal, db: Session ): actual = get_balance(account_id, db) return actual == expected_balance Any discrepancy became visible, any balance could be independently verified, and any historical state could be reproduced. That alone justified the migration effort. The Migration Strategy We couldn't switch overnight, so instead we ran both systems in parallel. First, we backfilled the ledger with a one-time script: it walked the existing transactions table in chronological order and, for each row, generated a matching debit and credit entry, preserving the original timestamps so historical balance queries would still be accurate. Then we reconciled every account. Most balances matched, but some didn't, and those discrepancies revealed bugs we had never noticed before. For six weeks, both systems operated simultaneously. Eventually, all balance reads moved to the ledger, and the balance column remained only as a verification mechanism. After two months of clean reconciliation reports, we removed it entirely. But Isn't This Slower? Yes. Reading a single balance column is faster than scanning ledger entries, and no serious engineer disputes that. The solution is not to abandon the ledger, but to separate truth from optimisation. We introduced a balance projection table. CREATE TABLE account_balance_cache ( account_id UUID PRIMARY KEY, balance DECIMAL(19,4) NOT NULL, last_entry_id UUID NOT NULL, updated_at TIMESTAMP NOT NULL ); The cache exists for performance, and the ledger exists for correctness. If the cache becomes corrupted, it can be rebuilt. If the ledger becomes corrupted, you have a much larger problem. That distinction matters. The Most Expensive Database Column We Ever Maintained The most expensive database column we ever maintained was accounts.balance. Not because it was difficult to update, and not because it was slow, but because it encouraged us to think about money as state instead of history. Once we started treating financial events as the source of truth and balances as projections, a surprising number of operational problems became easier to solve. Reconciliation became simpler. Auditing became simpler. Incident investigations became simpler. The architecture did not become simpler. It became more honest about the nature of the data it was managing. What We'd Tell a Team Starting From Scratch If you're early in building a payments or wallet system, the lessons we learned the expensive way are worth front-loading: Model money as events, not state. If a number can be derived from history, it shouldn't also be the thing you mutate directly. Assume retries will happen. Idempotency isn't an edge case in distributed systems, it's the default behavior you have to design against. Build reconciliation in from day one, not after an incident forces you to. A reconcile() function that's trivial to call is worth more than any amount of monitoring after the fact. Separate truth from performance. A cache that's wrong can be rebuilt. A ledger that's wrong is a much harder conversation with your customers and your regulators. None of this required exotic technology. It required treating the ledger as what it actually was all along: not an implementation detail, but the product. A balance is temporary. A ledger is permanent \ \
View original source — Hacker Noon ↗

