Myth Busting: Order by cardinality in a composite index

There’s a lot of advise floating around about how to order the columns in a composite index. The suggestions are usually decent enough as a starting point (so long as it remembers the Golden Rule of Indexing) but I often see something similar to:

For a query with equality conditions, columns which have more distinct values should appear first

To be clear, this is not a bad rule of thumb. The problem is that it comes from a misunderstanding of how indexes work, a reader may then imply the same misunderstanding leading to other confusion. The misunderstanding being that the index traversal is done by searching against the index for the first column, then applying another search for the second column, this is not the case. I’m going to discount the situation where you have multiple queries and some of those queries only use one of your columns as a filter (if you’re in that situation, leading with that column is what you’re going to do).

A composite B-tree index is just an ordered list of row pointers, the order is given by the column list in the create index statement. This ordering allows it to be searched efficiently. If you have only equality conditions against the columns in the index then you know exactly where to start reading the index, and you know when you’ve read all the data you need. The order of these columns makes no difference to how you know where to start and how much data you need to read from the index.

Example

Maybe you need convincing, good! Let’s create a table of 1,000,000 rows, consisting of a primary key, a column which has 100,000 different values (CODE_100000) and a column which has 2 different values (code_2). The two non-primary key columns are going to be the columns we’re filtering – they have a massive difference in selectivity so it should be easy to show if a difference in performance exists.

Let the query that we want to optimize looks like:

select id, code_2, code_100000 from myth_table where code_2=1 and code_100000=1;

If we were to follow the mentioned rule of thumb, we would create the index with code_100000 as the leading column and code_2 as the next column:

create index myth_table_rule on myth_table (code_100000, code_2) ;

And if we were to ignore that rule and go in the opposite order we would have:

create index myth_table_andy on myth_table (code_2, code_100000) ;

What really happens

Oracle

I’m using the mod function to get a known distribution of values, you can run this on your own system with whatever distribution you like if you don’t trust me 🙂

drop table myth_table;
create table myth_table (
  id number,
  code_2 number,
  code_100000 number
);
insert into myth_table (
    id, code_2,code_100000
)
select rownum,
    mod(rownum,2),
    mod(rownum,100000)
from dual connect by rownum <= 1000000;
select count(*), count(distinct code_2), count(distinct code_100000)
from myth_table;
commit;

create index myth_table_rule on myth_table (code_100000, code_2) ;
create index myth_table_andy on myth_table (code_2, code_100000) ;

alter session set statistics_level=all;
set serverout off

set feedback only
select /*+ index (myth_table myth_table_rule) */* from myth_table where code_100000 = 10 and code_2 = 0;
set feedback on
select * from dbms_xplan.display_cursor(sql_id=>null,format=>'typical allstats last');
set feedback only
select /*+ index (myth_table myth_table_andy) */* from myth_table where code_100000 = 10 and code_2 = 0;
set feedback on
select * from dbms_xplan.display_cursor(sql_id=>null,format=>'typical allstats last');

And the results:

  COUNT(*) COUNT(DISTINCTCODE_2) COUNT(DISTINCTCODE_100000)
---------- --------------------- --------------------------
   1000000                     2                     100000

---------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |      1 |        |       |    49 (100)|          |     10 |00:00:00.01 |      14 |      2 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| MYTH_TABLE      |      1 |     40 |  1560 |    49   (0)| 00:00:01 |     10 |00:00:00.01 |      14 |      2 |
|*  2 |   INDEX RANGE SCAN                  | MYTH_TABLE_RULE |      1 |     40 |       |     3   (0)| 00:00:01 |     10 |00:00:00.01 |       4 |      2 |
----------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE_100000"=10 AND "CODE_2"=0)


----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |      1 |        |       |    49 (100)|          |     10 |00:00:00.01 |      14 |      2 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| MYTH_TABLE      |      1 |     40 |  1560 |    49   (0)| 00:00:01 |     10 |00:00:00.01 |      14 |      2 |
|*  2 |   INDEX RANGE SCAN                  | MYTH_TABLE_ANDY |      1 |     40 |       |     3   (0)| 00:00:01 |     10 |00:00:00.01 |       4 |      2 |
----------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE_2"=0 AND "CODE_100000"=10)

The amount of buffers (logical IOs) against the different indexes were exactly the same – 4. They found exactly the same 10 ROWIDs so ended up reading the same rows from the table. The order of columns made no difference.

SQL Server

We’ll do the same on SQL Server, I’ll create a heap table for simplicity (you can try it yourself with a clustered index). I’m using similar functionality to ensure the same data distribution as my Oracle case but it’s important to note that this is not to compare performance between the two engines, this is about performance within each engine using different indexing techniques. I’m using set statistics IO on to get information about the IO we’re doing here

drop table myth_table;
create table myth_table (
  id int,
  code_2 int,
  code_100000 int
);
with rows_10 as (select * from (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as t(i))
  ,cte as (select row_number() over(order by r.i) rn
           from   rows_10 r
                  cross join rows_10 r1 -- 100  
                  cross join rows_10 r2 -- 1,000
                  cross join rows_10 r3-- 10,000  
                  cross join rows_10 r4-- 100,000  
                  cross join rows_10 r5 -- 1,000,000  
         )
insert into myth_table (
    id, code_2,code_100000
)
select  rn,
        rn % 2,
        rn % 100000
from   cte
where  rn <= 1000000;

select count(*), count(distinct code_2), count(distinct code_100000)
from myth_table;

create index myth_table_rule on myth_table (code_100000, code_2) ;
create index myth_table_andy on myth_table (code_2, code_100000) ;

set statistics io on
select * from myth_table with (index (myth_table_rule))  where code_100000 = 10 and code_2 = 0;
select * from myth_table with (index (myth_table_andy))  where code_100000 = 10 and code_2 = 0;
set statistics io off

And the results:

                        
----------- ----------- -----------
1000000     2           100000
...
Table 'myth_table'. Scan count 1, logical reads 13, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
...
Table 'myth_table'. Scan count 1, logical reads 13, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

Here we can see we’re doing 13 logical reads no matter which index is used. If you want to read the actual execution plans, I’ve shared them here https://www.brentozar.com/pastetheplan/?id=BJz2LhVEc

PostgreSQL

PostgreSQL doesn’t support hinting, so if we want to force the optimizer we will need to be a bit more dramatic, I’ve decided that dropping indexes is probably the easiest for the demo. I’ll use EXPLAIN (ANALYZE, BUFFERS) to get a similar output to what Oracle gives us.

CREATE TABLE myth_table (
  id SERIAL UNIQUE NOT NULL,
  code_2 int NOT NULL,
  code_100000 int NOT NULL
);
insert into myth_table (
    code_2,code_100000
)
select
    mod(i,2),
    mod(i,100000)
from generate_series(1, 1000000) s(i);
select count(*), count(distinct code_2), count(distinct code_100000)
from myth_table;
create index myth_table_ix01 on myth_table (code_2, code_100000);
EXPLAIN (ANALYZE, BUFFERS)
select * from myth_table where code_100000 = 10 and code_2 = 0;
drop index myth_table_ix01;
create index myth_table_ix02 on myth_table (code_100000, code_2);
EXPLAIN (ANALYZE, BUFFERS)
select * from myth_table where code_100000 = 10 and code_2 = 0;
drop index myth_table_ix02;

And the results:

count	count	count
1000000	2	100000

Bitmap Heap Scan on myth_table  (cost=4.68..99.96 rows=25 width=12) (actual time=0.057..0.174 rows=10 loops=1)
  Recheck Cond: ((code_2 = 0) AND (code_100000 = 10))
  Heap Blocks: exact=10
  Buffers: shared read=13
  ->  Bitmap Index Scan on myth_table_ix01  (cost=0.00..4.67 rows=25 width=0) (actual time=0.040..0.040 rows=10 loops=1)
        Index Cond: ((code_2 = 0) AND (code_100000 = 10))
        Buffers: shared read=3
Planning:
  Buffers: shared hit=19 read=7
...
Bitmap Heap Scan on myth_table  (cost=4.68..99.96 rows=25 width=12) (actual time=0.052..0.069 rows=10 loops=1)
  Recheck Cond: ((code_100000 = 10) AND (code_2 = 0))
  Heap Blocks: exact=10
  Buffers: shared hit=10 read=3
  ->  Bitmap Index Scan on myth_table_ix02  (cost=0.00..4.67 rows=25 width=0) (actual time=0.043..0.043 rows=10 loops=1)
        Index Cond: ((code_100000 = 10) AND (code_2 = 0))
        Buffers: shared read=3
Planning:
  Buffers: shared hit=20 read=1 dirtied=1
...

The lines underneath the Bitmap Index Scan line tells us how much work we did against the index, both times we did 3 reads and those lead to an additional 10 reads from the table. If you would like to try tweaking the demo, you can do so with DBFiddle

Does it ever matter?

Of course it matters! Once you’ve included all the columns with equality conditions (that are worth including), you have one final column which can provide index selectivity. Here, you can choose by how selective your range filters are going to be – this is your last piece of index selectivity. e.g. if you had the query

select * 
from  queue_table 
where queue = 'MYQUEUE' 
and   status='READY' 
and   expire_date > sysdate 
and   start_date < sysdate 
and   owner <> 'BLOCKED'

The two equality conditions here are obvious so we can start with those (again, assuming they’re decent filters). That leaves our filters against expire_date, start_date and owner. Owner can’t be used as a range scan1, so we can discount that first. We just need to decide which filter reduces the number of rows from our already filtered set the most. It’s simple enough to run a small query to figure this out:

select count(case when expire_date > sysdate then 1 else null end) expire_date
      ,count(case when start_date  < sysdate then 1 else null end) start_date  
      ,count(*)
from  queue_table 
where queue = 'MYQUEUE' 
and   status='READY' 

This will tell us what we need to know – and if the count(*) result is sufficiently small we might decide the index is already good enough. Let’s imagine the result came out to be

EXPIRE_DATE  START_DATE   COUNT(*)
----------- ----------- ----------
       3986         200       4000

The choice is clear here, the start_date filter means we reduce our index selectivity to 200 so including start_date after queue and status is a decent idea. At this point, our index selectivity can no longer be improved. Adding further columns can reduce the amount of data read from the table but you’re growing the index (and therefore having to do more work against the index). It may still be overall beneficial to add the columns – it’s a balancing act.

1We can technically split owner <> 'BLOCKED' into (owner < 'BLOCKED' or owner > 'BLOCKED') which could be expanded to give us two index range scans. In this case, this is also a filter which is not going to provide great selectivity so we don’t need to worry. The reader is welcome to see if their DBMS of choice can make this transformation when it is worthwhile.

Summary

When the columns in the index are subject to equality conditions, it doesn’t matter which order they appear in so long as they are not after columns which are not subject to equality conditions (see the Golden Rule of Indexing). Like I said in the intro, it’s not a bad rule of thumb to say they should be ordered by cardinality, it’s not going to give you bad performance, it just doesn’t matter.

I haven’t mentioned a few things that could be interesting to think about:

  • Does the order of columns effect the clustering factor statistic?
  • Does the real clustering factor impact the performance?
  • Does having a highly repetitive leading column give benefits to prefix compression – does that impact performance?
  • Does having access to index skip scans make one way more forgiving?