A common pattern in automated testing is to clear the database state before every test runs. This ensures that the tests remain robust and immune to side-effects of adjacent tests. While a clean database state for every test is beneficial in this sense, the overhead of clearing data can be significant. In one of our largest test suites comprised of thousands of tests interacting with Postgres, we reduced the run time of the suite by over 33% by switching our table-clearing SQL to use
DELETE rather than
Our team’s productivity depends on the performance of this test suite, since they run very frequently and are an integral part of the deployment pipeline. The nearly 3,200 tests included are responsible for the correctness of one of our most critical codebases. The codebase has constant activity, and changes are continuously committed by authors from multiple teams. The entire suite runs before every commit is merged, as well as after each commit lands in master.
We were surprised when we measured the overhead of clearing table data. Before each test runs, no fewer than 32 tables are wiped clean with a
TRUNCATEs accounted for 20-200 milliseconds of overhead for each test. In other words, this added 1 to 5 minutes of overhead for the entire suite when run serially. (In practice, we run groups of tests in parallel, so the perceived total overhead was closer to about 1 minute total.)
When we swapped the
DELETEs, the overhead of each test dropped to 1-3 milliseconds. This reduced the combined overhead of all the tests to a total of about 10 seconds. In the graph below, you can see that average test suite run time dropped by over 33%—from over 3 minutes to about 2 minutes.
The Difference Between TRUNCATE and DELETE
We were excited about the substantial boost to our test performance, but curious about why the switch from
DELETE helped. After all, the result of
TRUNCATE table_a; or
DELETE FROM table_a; is the same: an empty
table_a. So why would one be any faster than the other?
DELETE in Postgres do different things under the hood. From a database user perspective, the results seem the same: data is deleted; however, internally, there are major differences in the underlying data representing the table. With
DELETEs, dead rows remain in database pages and their dead pointers are still present in indices.
TRUNCATE, on the other hand, is a bit more thorough in wiping the slate clean. The resulting table looks almost identical internally to a newly
CREATEed table. There are no dead rows, the indices are empty, and the table statistics are reset. There are quite a few other differences as well, all highlighted by a terrifically detailed post by Craig Ringer.
TRUNCATE is a constant time operation while
DELETE complexity depends on the number of rows in the table.
TRUNCATE is a fast operation to quickly clear out huge tables, but
DELETE is much faster when a table contains just a handful of rows. This is why our tests saw such a significant speed improvement by the change: each test only inserts a handful of rows.
With the large size of this test suite and the heavy traffic from our engineers, a reduction in the run time of the tests has a big impact on developer happiness and productivity. This tweak is perhaps most satisfying because of the combination of significant test speedup and it’s relative simplicity. The investigation also offers interesting insights into Postgres internals and a take-away to use
DELETE on very small tables and
TRUNCATE on everything else.
If you’re interested in working with Postgres to Make the World Programmable, you’re in luck! We’re hiring!