PostgreSQL Multicolumn Indexes

Summary: in this tutorial, you will learn how to create PostgreSQL multicolumn indexes, which are indexes defined on two or more columns of a table.

Introduction to PostgreSQL multicolumn indexes

When you create an index on two or more columns within a table, this type of index is called a multicolumn index.

A multicolumn index is often referred to as a composite index, a combined index, or a concatenated index.

A multicolumn index can have a maximum of 32 columns. The limit can be adjusted by modifying the pg_config_manual.h file when building PostgreSQL source code.

Additionally, only B-tree, GIST, GIN, and BRIN index types support multicolumn indexes.

The following shows the syntax for creating a multicolumn index:

CREATE INDEX [IF NOT EXISTS] index_name
ON table_name(column1, column2, ...);

In this syntax:

First, specify the index name in the CREATE INDEX clause. Use the IF NOT EXISTS option to prevent an error from creating an index whose name already exists.
Second, provide the table name along with the index columns in the parenthesis.

When defining a multicolumn index, you should place the columns that are frequently used in the WHERE clause at the beginning of the column list, followed by the columns that are less frequently used in the WHERE clause.

In general, the query optimizer can use the index when the query’s conditions involve the index’s leading (leftmost) column. For example, if you have an index on (column1, column2, column3), it will be considered for queries such as:

WHERE column1 = v1 AND column2 = v2 AND column3 = v3;

WHERE column1 = v1 AND column2 = v2;

WHERE column1 = v1;

In these cases, the condition on column1 (and optionally on column2) allows PostgreSQL to efficiently narrow down the portion of the index that needs to be scanned.

However, if a query does not constrain the first column of the index, PostgreSQL must evaluate whether a full index scan on this index is more efficient than alternative indexes or a table scan.

For instance, consider a query with only later columns in the WHERE clause:

WHERE column3 = v3;

WHERE column2 = v2 and column3 = v3;

Note that you can also use the WHERE clause to define a partially multicolumn index.

In such scenarios, PostgreSQL will still consider using the index, but scanning the whole index may have a higher cost than other options. The planner evaluates different execution paths, and if an index scan is not the most efficient, it may choose a sequential scan instead.

note

You can also use a WHERE clause to define a partial multicolumn index (an index on multiple columns that only includes rows satisfying a given condition).

PostgreSQL Multicolumn Index example

First, create a new table called people using the following CREATE TABLE statement:

CREATE TABLE people (
    id INT GENERATED BY DEFAULT AS IDENTITY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL
);

The people table consists of three columns: id, first name, and last name.

Second, execute the INSERT statement in the following file to load 10,000 rows into the people table:

Script to load 10000 names

Third, show the query plan that finds the person whose last name is Adams:

EXPLAIN SELECT
  id,
  first_name,
  last_name
FROM
  people
WHERE
  last_name = 'Adams';

Here is the output:

QUERY PLAN
---------------------------------------------------------
 Seq Scan on people  (cost=0.00..83.88 rows=9 width=240)
   Filter: ((last_name)::text = 'Adams'::text)
(2 rows)

The output indicates that PostgreSQL performs a sequential scan on the people table to find the matching rows because there is no index defined for the last_name column.

Fourth, create a multicolumn index that includes both the last_name and first_name columns. Assuming that searching for people by their last name is more common than by their first name, we define the index with the following column order:

CREATE INDEX idx_people_names
ON people (last_name, first_name);

Fifth, show the plan of the query that searches for the person whose last name is Adams (using the new index):

EXPLAIN SELECT
  id,
  first_name,
  last_name
FROM
  people
WHERE
  last_name = 'Adams';

Output:

QUERY PLAN
--------------------------------------------------------------------------------
 Bitmap Heap Scan on people  (cost=4.42..44.07 rows=18 width=17)
   Recheck Cond: ((last_name)::text = 'Adams'::text)
   ->  Bitmap Index Scan on idx_people_names  (cost=0.00..4.42 rows=18 width=0)
         Index Cond: ((last_name)::text = 'Adams'::text)
(4 rows)

The output indicates that the query optimizer uses the idx_people_names index for the last_name = 'Adams' query.

Sixth, find the person whose first name is Lou (without specifying the last name):

EXPLAIN SELECT
  id,
  first_name,
  last_name
FROM
  people
WHERE
  last_name = 'Adams'
  AND first_name = 'Lou';

Output:

QUERY PLAN
--------------------------------------------------------------------------------------------
 Index Scan using idx_people_names on people  (cost=0.29..8.30 rows=1 width=17)
   Index Cond: (((last_name)::text = 'Adams'::text) AND ((first_name)::text = 'Lou'::text))
(2 rows)

The output indicates that the query optimizer will use the index because both columns in the WHERE clause (first_name and last_name) are included in the index.

Seventh, search for the person whose first name is Lou:

EXPLAIN SELECT
  id,
  first_name,
  last_name
FROM
  people
WHERE
  first_name = 'Lou';

Output:

QUERY PLAN
----------------------------------------------------------
 Seq Scan on people  (cost=0.00..186.00 rows=32 width=17)
   Filter: ((first_name)::text = 'Lou'::text)
(2 rows)

The output indicates that PostgreSQL performs a sequential scan instead of using the index, because the first column of the index (last_name) is not constrained. Since using the index would require scanning all its entries, the planner determined that a sequential scan on the table was more efficient.

Summary

Use a PostgreSQL multicolumn index to define an index involving two or more columns from a table.
Place the columns that are frequently used in the WHERE clause at the beginning of the column list of the multicolumn index.

PostgreSQL Multicolumn Indexes

Introduction to PostgreSQL multicolumn indexes

note

PostgreSQL Multicolumn Index example

Summary

On this page