Slow PostgreSQL Performance? Vacuum your Database!

Share this post

TL;DR

Make sure your largest database tables are vacuumed and analyzed frequently by setting stricter table-level auto-vacuum settings. Below is an example which will VACUUM and ANALYZE after 5,000 inserts, updates, or deletes.

ALTER TABLE table_name SET (autovacuum_vacuum_scale_factor = 0.0);
ALTER TABLE table_name SET (autovacuum_vacuum_threshold = 5000);
ALTER TABLE table_name SET (autovacuum_analyze_scale_factor = 0.0);
ALTER TABLE table_name SET (autovacuum_analyze_threshold = 5000);

The Symptoms

At Lob, we’ve built an internal website to track business metrics, facilitate our customer support team, and track the order status of our Postcard API and our Letter API. Aggregating some of this data, like monthly revenue for example, requires complex database calls to be made on tables containing millions of rows. We’re accustomed to these types of queries taking several seconds, and for the most part we’re fine with the speed, given that the site is not customer facing. Performance is not our primary concern here.

As our tables continued to grow, however, queries began to take 20 seconds or more. The internal website became unusable.

The Diagnosis

We began to diagnose the cause of the slow queries by running them on the database with different parameters and monitoring execution times. We looked for patterns to relate the types of queries to their latency. An interesting correlation between a date filter and performance surfaced. A query that fetched all rows inserted over a month ago would return in ~1 second, while the same query run on rows from the current month was taking 20+ seconds.

With this discovery, the next step was to figure out why the performance of these queries differed by so much. PostgreSQL’s EXPLAIN statement was an essential tool. When Postgres receives a query, the first thing it does is try to optimize how the query will be executed based on its knowledge of the table structure, size, and indices. Prefixing any query with EXPLAIN will print out this execution plan without actually running it.

What EXPLAIN Told Us

When we compared the outputs of EXPLAIN on the fast and slow queries, the problem became immediately evident. When joining two tables on an indexed foreign key, Postgres was performing an efficient Hash Join for the fast running query, and an inefficient Nested Loop for the slower query.

Here’s the output for the fast query:

Sort (cost=43509.92..43523.14 rows=5291 width=20)
Sort Key: (date(charges.date_created))
-> HashAggregate (cost=43116.55..43182.69 rows=5291 width=20)
-> Hash Join (cost=186.57..43076.07 rows=5397 width=20)
Hash Cond: ((charges.account_id)::text = (accounts.id)::text)
-> Seq Scan on charges (cost=0.00..42774.71 rows=5409 width=41)
Filter: ((NOT deleted) AND (date_created > '2015-03-16 00:00:00'::timestamp without time zone) AND (date_created < '2015-03-24 00:00:00'::timestamp without time zone))
-> Hash (cost=121.32..121.32 rows=5220 width=21)
-> Seq Scan on accounts (cost=0.00..121.32 rows=5220 width=21)
Filter: (NOT admin)

And here’s the explain output for the slow query:

GroupAggregate (cost=42961.29..42961.32 rows=1 width=20)
-> Sort (cost=42961.29..42961.30 rows=1 width=20)
Sort Key: (date(charges.date_created)), charges.type
-> Nested Loop (cost=0.00..42961.28 rows=1 width=20)
Join Filter: ((charges.account_id)::text = (accounts.id)::text)
-> Seq Scan on charges (cost=0.00..42774.71 rows=1 width=41)
Filter: ((NOT deleted) AND (date_created > '2015-03-17 00:00:00'::timestamp without time zone) AND (date_created < '2015-03-22 00:00:00'::timestamp without time zone))
-> Seq Scan on accounts (cost=0.00..121.32 rows=5220 width=21)
Filter: (NOT admin)

Note that both of these queries were identical, except for the date ranges filtering the rows.

The Cure

As mentioned above, when Postgres builds the query plan, it optimizes based on what it knows about the structure and size of the database. However, its knowledge of the database is not always up-to-date. Without accurate insight about the database tables, suboptimal query executions can be planned. In our case, the query optimizer created slower query plans for the newest rows. This explained how the same query was fast for older rows, of which the database had accurate knowledge, and slow for the youngest rows.

The solution was to VACUUM and ANALYZE the table. Vacuuming cleans up stale or temporary data, and analyzing refreshes its knowledge of all the tables for the query planner. We saw an immediate decrease in execution time for our complex queries, and as a result, a much more user-friendly internal website.

VACUUM ANALYZE table_name;

You can check the last time your tables were vacuumed and analyzed with the query below. In our case, we had tables that hadn’t been cleaned up in weeks.

SELECT relname, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze
FROM pg_stat_all_tables
WHERE schemaname = 'public';

Setting Up Auto Vacuum

To prevent our tables from continually getting messy in the future and having to manually VACUUM ANALYZE, we made the default auto-vacuum settings stricter. Postgres runs a daemon to regularly vacuum and analyze itself. Tables are auto-vacuumed when 20% of the rows plus 50 rows are inserted, updated or deleted, and auto-analyzed similarly at 10%, and 50 row thresholds. These settings work fine for smaller tables, but as a table grows to have millions of rows, there can be tens of thousands of inserts or updates before the table is vacuumed and analyzed.

In our case, we set much more aggressive thresholds for our largest tables, using the commands below. With these settings, vacuuming and analysis occur after a table sees 5,000 inserts, updates, or deletes.

ALTER TABLE table_name
SET (autovacuum_vacuum_scale_factor = 0.0);

ALTER TABLE table_name
SET (autovacuum_vacuum_threshold = 5000);

ALTER TABLE table_name
SET (autovacuum_analyze_scale_factor = 0.0);

ALTER TABLE table_name
SET (autovacuum_analyze_threshold = 5000);

The threshold to auto-vacuum is calculated by:

vacuum threshold = autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * number of rows in table

Similarly, the threshold to auto-analyze is calculated by:

analyze threshold = autovacuum_analyze_threshold + autovacuum_analyze_scale_factor * number of rows in table

Conclusion

This is one of the many challenges we’ve tackled at Lob as our traffic and business continue to grow. Since Lob is ushering the analog processes of direct mail into the digital age with our APIs, our engineering team faces a unique and rewarding set of obstacles.

For more information about PostgreSQL vacuum and analyze settings, visit: https://www.postgresql.org/docs/9.2/routine-vacuuming.html.

Additional Postgres Resources

This wasn't the first time the team needed to level up our SQL know-how, here's how we tackled a deep dive into PostgreSQL
Millions of HTTP requests daily means millions of records; this means there is no such thing as a small change: Running Database Changes With Zero Downtime ‍
Convenient SQL for Monitoring Postgres Database Health (an open-source library of SQL snippets that aid in Postgres administration)
We reduced the run time of the suite by over 33% by switching our table-clearing SQL to use 'DELETE' rather than 'TRUNCATE'.
From the archives: Automated VACUUM’s are helpful in keeping Postgres lean and performant. However, these regular cleanups can miss some types of database dust bunnies. Here’s how we run VACUUM FULL on RDS Postgres with minimal downtime‍

FAQs

Answered by:

This blog provides general information and discussion about direct mail marketing and related subjects. The content provided in this blog ("Content”), should not be construed as and is not intended to constitute financial, legal or tax advice. You should seek the advice of professionals prior to acting upon any information contained in the Content. All Content is provided strictly “as is” and we make no warranty or representation of any kind regarding the Content.

Slow PostgreSQL Performance? Don't Forget to Vacuum your Database

TL;DR

The Symptoms

The Diagnosis

What EXPLAIN Told Us

The Cure

Setting Up Auto Vacuum

Conclusion

Additional Postgres Resources

FAQs

Continue Reading

End-to-end direct mail automation: how Lob offers the best tools and platform

How Lob is the best direct mail automation platform for telecom companies

Best direct mail strategies for residential and commercial real estate

Lob's website experience is not optimized for Internet Explorer. Please choose another browser.

Slow PostgreSQL Performance? Don't Forget to Vacuum your Database

TL;DR

The Symptoms

The Diagnosis

What EXPLAIN Told Us

The Cure

Setting Up Auto Vacuum

Conclusion

Additional Postgres Resources

FAQs

Continue Reading

End-to-end direct mail automation: how Lob offers the best tools and platform

How Lob is the best direct mail automation platform for telecom companies

Best direct mail strategies for residential and commercial real estate

Lob's website experience is not optimized for Internet Explorer.
Please choose another browser.