Meet ‘Kevin’ — the immutable data store that brings functional programming paradigms to our persistence layer - Paxos

Products

Assets

Company

Transparency

Developers

Products

Assets

Company

Transparency

Developers

< Blog Home

Meet ‘Kevin’ — the immutable data store that brings functional programming paradigms to our persistence layer

John Amicangelo

Oct 22, 2019

Paxos offers automation to the confirmation and settlement processes in the commodities and FX spaces through its Post-Trade Service. Automation is another word for technology and the technology powering Paxos’ Post-Trade is our innovative trade matcher, Kevin.

A trade is an exchange of goods or services between two parties with agreed upon economic details. For trades to be accepted, counterparties must have the same terms. For example, if Bank A wants to sell 1 unit of gold to Bank B for $1000, in order for the trade to settle, Bank B must agree that it wants to buy 1 unit of gold from Bank A for $1000.

To bring automation to trade confirmations, we built technology that searches for a resting trade that has the opposite terms of a selected trade (like the example above). That sounds simple, but in practice, match criteria can be more complicated. Matching can introduce variables like price tolerance, location or other criteria.

How Kevin Works

In Kevin, all trades are recorded in a trades table that is append-only. Rows and entry fields are never deleted. The only time rows are updated is to keep track of who the most recent version of a trade is. Every “update” to a trade is represented by an INSERT.

The most important columns in the trades table are trade_id, version, and latest. Each row is unique by (trade_id, version). The most recent version has latest = true while the outdated entries show latest = false. We use PostgreSQL but any relational database with transactions can enforce this invariant trivially:

If your DB supports uniqueness constraints and/or unique indexes, this can also be made thread safe.

Two clients can race to update the same trade and whoever is second to attempt an insert will hit the uniqueness constraint with values ‘foo’, ‘bar’, 6 and fail.

Two Kevin clients running in parallel attempt to create version 6 of ‘foo’ at the exact same time. One of them will fail on the uniqueness constraint.

Note that it’s also safe to use a specific int and int+1 value for version because the transaction will fail unless you’re inserting a version greater than every other one (the incremental version invariant has been true for every row inserted).

Consumers

A consumer is anyone subscribed to the feed of match updates from Kevin. Each match update also has Kevin’s version attached to it. Note that the version is an implementation detail of Kevin and doesn’t necessarily map one-to-one with anyone’s mental model of a trade or its history. For example, would a client consider an update to a trade with no changes to its economic details a version bump? Maybe, maybe not.

So why do we bother showing an implementation detail to consumers? Because it allows them to implement idempotent updates. Consumers can guard against out of order updates or receiving the same update more than once by adding a similar uniqueness constraint on (trade_id, version).

Repeat updates will fail the uniqueness constraint. This failure can be checked for (or ignored if your DB supports that) and we can be confident the update was already applied.

Consumers can choose a strategy for out of order updates. Some options are:

Drop any versions less than the greatest you have. This will leave gaps > 1 in your versions that will never be resolved.
Wait to change your latest = true row until you have a contiguous run of subsequent versions. This is useful when you’re reacting to transitions between trade details (ex. send an email on match) and want to play out every update as if it was received in order.

We tell the `lead` window function to select the version in front of us within our root_id and set it to next_version. We then exclude rows where next_version is NULL (we’re the most recent), or next_version is no greater than 1 version ahead. We also need to check behind us for the edge case where we never received a trade with version = 1. tl;dr: if any rows come back from this query, we’re missing an update.

Cost

Nothing is free. There are costs to an append-only table. One is that old versions build up, but that’s easy to deal with. Old rows can be moved to cold storage. For our use case, we define old as > 1 month old and latest = false. Another cost is the impact to write performance. Indexes negatively impact insert speed. In practice, we have not observed any noticeable difference before and after adding our unique indexes. We’ve implemented this append-only strategy in other tables and performance has remained the same.

Justification

It was not always obvious to build our matcher on top of an essentially immutable SQL table. Some concerns from our team members when we first proposed Kevin as a solution included:

It is not intuitive to update without actually using an SQL UPDATE statement
The cruft built up by inserting rows so frequently
The complexity of multiple statements required for almost every transaction
You could get the same benefits simply by writing to a log table

For points 1–3 we have not experienced a negative impact. Point 4 is debatable. A separate log table has the capacity to drift from the main table, and to replay a trade from earlier in its life you would need a mechanism to bring logs back into the main table.

Having an immutable data store that doesn’t require any maintenance beyond unique index constraints has given us a competitive advantage. We can respond quickly to ad-hoc customer requests like “can you change our trade contract to X for the next few days and apply it retroactively since yesterday?” because we can replay trades from any point in time.

Conclusion

Kevin has given us the advantages of functional programming at the database level. Our application code is predictable and repeatable even when hitting the persistence layer, which is not a luxury we had with our previous, typical CRUD strategy. If you want to fulfill mission critical business requirements while actually increasing your iteration speed, consider making your data store immutable.