Content By Devops .com
Every monitoring solution is limited by the information the system allows to be collected. PostgreSQL DBMS provides many metrics reflecting its internal state, running queries and background processes, and each release brings the community more monitoring-related features. PostgreSQL 13, which was released in September of 2020, added key improvements, including query sampling, improved WAL usage statistics and new system views. Let’s look at the benefits of each of these new features.
Those who regularly monitor database queries are familiar with the following parameters:
log_statement controls what SQL statements are logged
log_min_duration_statement enables logging of statements that ran for at least the specified amount of time.
These parameters help users track down unoptimized queries in applications. However, it is important to recognize that excessive log traffic may impact performance, especially when analyzing workloads with a lot of short queries. To help address this, PostgreSQL 13 introduced two new GUC parameters to allow logging of sampling statements.
log_statement_sample_rate controls the sample rate (0.0 – 1.0). That is, it allows users to control the percent of queries that are logged.
log_min_duration_sample defines the minimum required statement duration of queries to sample
Only statements with a duration exceeding log_min_duration_sample are considered for sampling. Because of this, to enable sampling, both GUCs need to be set correctly. Further, the existing log_min_duration_statement GUC has a higher priority than log_min_duration_sample, so statements with a duration exceeding log_min_duration_statement will always be logged, no matter how the sampling is configured. This means that only configurations with log_min_duration_sample < log_min_duration_statement actually sample the statements, instead of logging everything. Note that only superusers can change these settings.
Consider the following synthetic query sampling test using pgbench on a laptop.
- 1st Row: No sampling. All queries are logged
- 2nd Row: Log all queries longer than 1 second, plus 10 percent of all other queries
- 3rd Row: Log all queries longer than 1 second
In this example, sampling improves timing slightly, but since the test was run on a personal laptop, this measurement does not signify much.
However, the size difference is very interesting. Sampling helps make the logfile size approximately 10 times smaller. We can figure out that most of the queries in this test lasted less than 1 second. And if we log only long queries, the logfile remains almost empty (third row). The difference here is significant. Even if disk space consumption isn’t a concern, sampling can help speed up tools that analyze PostgreSQL logs.
PostgreSQL 13 also introduced settings to improve control over prepared statement parameter logging.
If log_parameter_max_length is set, bind parameter values will be trimmed to this many bytes.
log_parameter_max_length_on_error adds context for the failure of parameterized queries.
Previously, prepared statement parameters were never logged during errors. Note, since these logging features can impact performance, they should be used with care.
Write-Ahead Log Usage Statistics
A write-ahead log (WAL) is crucial to database system durability, and is also used for physical replication. This makes monitoring the WAL important. PostgreSQL 13 allows users to explore WAL usage per query using the EXPLAIN command. The following example shows that the insertion of a thousand rows generates 1001 records and takes about 1 Mb of space. Note that auto_explain also shows this statistic.
It is also possible to find WAL usage statistics in autovacuum log messages. However, this works only when autovacuum logging is enabled with the log_autovacuum_min_duration parameter.
Probably the most useful place to see WAL statistics for a monitoring plugin is in pg_stat_statements. PostgreSQL 13 has new columns that show WAL usage statistics per query.
Another addition to pg_stat_statements is a group of fields to track planning time. This feature is disabled by default because it can cause a noticeable decrease in performance. When enabled, it shows total, minimal, maximal and mean planning time. The pg_stat_statements.track_planning parameter enables this feature. It is important to note that this feature breaks backward compatibility. Previously, existing “time” fields — min_time, max_time, total_time and mean_time — had their suffixes changed from “time” to “exec_time.” As a result, all monitoring tools that rely on pg_stat_statements should be adjusted accordingly.
The next feature provides more information about parallel queries. In the past, parallel queries did not reveal which back end was related to the parallel query or provide easy insight into all the processes involved in a query. It is now possible to show the parallel group leader information, at the SQL level, using the leader_pid in the pg_stat_activity system view.
New System Views
PostgreSQL 13 offers new system views that can be used for PostgreSQL monitoring. The first group improves progress reporting.
pg_stat_progress_basebackup helps to estimate the progress of a base backup of a PostgreSQL cluster.
pg_stat_progress_analyze shows information about backends running the ANALYZE operation, including autovacuum workers. It can help administrators understand how long it will take to collect statistics.
Another group of new system views helps answer the following question: “How does Postgres spend its allocated memory?” These views will be mostly useful for DBMS admins and low-level support engineers who inspect bottlenecks and optimize memory usage.
pg_shmem_allocation can be enabled for a deeper inspection of how shared memory is used for internal DBMS structures, including a lock table, process array, data about various background processes and so on.
pg_stat_slru is related to the inspection of SLRU caches. SLRU is a special buffer cache where non-user data is stored, such as the state of each transaction, as well as information about subtransactions. Asynchronous notifications also use SLRU.
Wait Events Changes
PostgreSQL 13 introduced the renaming of various wait events to improve consistency. The brief reference to this in the release notes does not do justice to the importance of this change. Wait events were added in Postgres 9.6, and are usually needed to diagnose bottlenecks that occur because of interprocess locks. Previously, wait events were named in all possible styles. Now, the naming is more consistent, and the wait events are better documented and, hopefully, more understandable. However, admins need to update any monitoring scripts that use wait events, including the following:
Hash/Batch/Allocating → HashBatchAllocate
ControlFileLock → ControlFile
clog → XactBuffer
AsyncCtlLock → NotifySLRU
Several new wait events were also introduced in PostgreSQL 13, including:
Upper Limit for the track_activity_query_size Parameter
One final small improvement in PostgreSQL 13 is worth noting: an updated upper limit for the track_activity_query_size parameter. This parameter specifies the amount of memory reserved to store the text of the command. The previous maximum was 100 KB, which could be too small when monitoring applications that generate large SQL queries.
The new upper limit is 1 MB. However, keep in mind that the database allocates the amount of memory specified in track_activity_query_size in advance for all potential sessions. Thus, higher values can increase memory consumption.
The Importance of These Features
These significant updates in PostgreSQL 13 are vital for admins that run the database and want to tune the system, optimize applications, forecast system growth and do resource planning. However, the new features are also extremely important for DBMS developers who need to gather more information and collect real production statistics about various parts of PostgreSQL in order to find bottlenecks and opportunities to make improvements. As always, we look forward to even more great new features in future releases of PostgreSQL.