At Enova we take our data seriously. One of the important aspects of data is it’s lineage – Where did it come from? When was it created? What changes did it undergo? Who made those changes and when were they made? Systems that manage this metadata have many names, shapes and forms including but not limited to:
- Change data capture
- History tracking
- Change tracking
Change tracking within a PostgreSQL database is often implemented using table-level triggers to populate “history tables”. This process is straightforward to implement and provides an easy way to track the history of all changes made to data within the database. This approach has the significant advantage of providing the ability to query, filter, aggregate and generally process both the parent and history data with all the power of an RDBMS like PostgreSQL. However, recording these changes within the database can be quite expensive in terms of the operational overhead.
Now, what if we wanted to do this for an entire database? What if we have a shadow of our database within itself that tracks changes made against data in every table? Moreover, what if we wanted to be able to do this for any database? Operational overheads and avoiding maintenance due to schema evolution become prime considerations, especially for OLTP databases that have a lot of traffic and see significant churn in structure.
This is what we’ve created — and are giving to the community — in audit_star.
With the following design criteria in mind, we wish to be able to transactionally audit changes made to our data within the database.
- Low impact on transactional throughput
- Conservative disk usage
- Resilient to schema mutation and minimal maintenance
- Easy to deploy & re-deploy
- User friendly
audit_star is an in-database change tracking system for PostgreSQL written as a Go binary. It is invoked as a command line utility with minimal configuration against a PostgreSQL database and it sets up all the structure necessary to track changes being made to the data. It creates an history table for every source table to record data changes and uses standard PostgreSQL triggers to populate them.
The figure below identifies the salient components of the system. Apart from an audit table corresponding to every source table, and the triggers, audit_star also creates three views (delta, snapshot, and compare) to represent the audited data in a tabular form for non-power users who might not be familiar with the PostgreSQL DSL to deal with JSON data. The data presented by the views are fully typed in the image of the source table. As shown in the picture, the three views present increasingly more data, but also incur increasing hits in performance due to the complexities of the underlying query.
Here are a few simple use cases to demonstrate what audit_star looks like in action. Let us consider a simple table named test in a schema called schema with a structure as shown below. When we run audit_star, it creates a schema called schema_audit_raw and a history table called test_audit in schema_audit_raw to record the changes made to data in test. Note that audit_star created trigger functions on the test table to record the changes to data in the table.
The figure below shows audit_star capturing the effect of the three SQL statements (INSERT/UPDATE/DELETE) on data in the test table. It is worth noting that no actual data is captured on insert as an optimization. This will be explored in detail in the next section.
The figure below shows the same captured change set, but through the lenses of the three views – delta, snapshot, and compare. Note how each of the views provide progressively more data as suggested by their names.
Assessment of solution against design criteria
The section elaborates on the design decisions made in writing audit_star to meet each of our design criteria.
Resilient to schema mutation and minimal maintenance
One of most important costs of software engineering is maintenance, especially for the adoption of a new tool. We wanted a solution that was deploy-and-forget with minimal moving parts. Since the tooling is fully contained within the database, DevOps maintenance amounts to nothing because without the database up and running, there is nothing to audit. The more important factor for us was schema resilience because at Enova, we have significant amounts of schema churn on a daily basis. Any auditing solution that was meant to be cross-cutting had to be immune to these schema changes to be viable. We addressed this problem by using semi-structured storage in the form of JSONB to record the change set. We chose to use JSONB over plain JSON to allow ourselves the option of creating indices on the fly if necessary, even if it meant that we would have to sustain the overhead of marshaling the data at store time.
As a simple demonstration of this schema resilience, consider the table from the walkthrough example in the previous section. In the figure below we alter the structure of the table in lock-step with making DML changes to show how the audit system captures all changes while being unaffected by the DDL.
Low impact on transactional throughput
The crucial component of the system that determines impact on transactional throughput is the trigger function. The trigger function is the only “extra” code that is to be executed on every DML operation to record the change being made. Thus the design of this function is critical to ensure that transaction rate is not adversely affected. Here are some of the things we learnt about optimizing for transaction throughput along the way:
- A B-Tree index costs roughly about 8-10% in TPS (transactions per second), a measure of transactional throughput.
- A GIN/GiST index over JSONB can costs about 80% in TPS, but is of course highly dependent on the size of the payload being inserted.
- Exception handling blocks (savepoints) in PL/PGSQL cost around 4% in TPS.
- Executing dynamic SQL (construct a string and execute it) costs around 30% in TPS for simple inserts.
Keeping these points in mind, the design decisions we made were:
- A single B-Tree index on the primary key column of the audit table so that it can be joined to easily.
- No GIN/GiST index, but record the change as JSONB so that we have an option to index (if needed partially) temporarily if and when necessary.
- No exception handling blocks – not just because of the impact on TPS, but because a true audit system should not be allowing a change to happen if it cannot record it
- No dynamic SQL because of the heavy impact on TPS. The consequence of this was that we could have a single trigger function for all tables, but that was something that could easily be meta-programmed.
Many variations of the trigger function were benchmarked using pg_bench on production grade hardware and the TPC-B benchmark. The final product incurs about a 30% TPS overhead.
Conservative disk usage
Given that we’re recording any and all changes for majority of our data, we wanted to be conservative in terms of disk usage. Aggressive usage would mean that our SANs would fill up faster requiring more frequent rotations, and logging more data would also adversely impact both TPS and query time. To keep our records lean, we decided to log the bare minimum of just the change set by diffing what has changed. In addition to this, we noted that redundantly re-capturing data about newly inserted rows caused significant amount of bloat, so we only capture the event and not the data itself. The data is only captured once it is updated or deleted. This has the desirable side benefit of reducing the impact on TPS for INSERT statements since the amount of data being recorded is significantly lower.
Easy to deploy and re-deploy
Keeping with the theme of simplicity, we wanted something that didn’t require a lot of configuration and preparation when it came to deployments. To this end, audit_star is written as a command line binary in Go. The binary can be cross-compiled for deployment on any platform/architecture. The tool is also idempotent. Running it twice would have no adverse effects and thus it becomes effective to run it after every deployment. This ensures that if any changes were made to the database structure such as adding new tables, they are also setup for auditing at the same time.
The big challenge when it comes to semi-structured data in an RDBMS is that it is quite unconventional and can be hard to wrap one’s mind around. As mentioned before, audit_star three views (delta, snapshot, and compare) to represent the audited data in a tabular form to ameliorate this and any other confusion that users might have dealing with JSON data in PostgreSQL.
We made this tool open-source. The repository is self-documented with instructions on how to setup and deploy. We also have identified candidates for optimization with each new PostgreSQL release such as BRIN indices (PostgreSQL 9.5) for indexing monotonically self-ordered data such as time and transition trigger tables (PostgreSQL 10.0) for reducing the impact of context switches on TPS using statement level triggers. As we continue to maintain and develop this tool, we would love for the community to get involved with this project through feedback and contributions.