Mostly NULL column in a composite index

This is just a fun one to show something special that can happen when you allow NULL values in your indexes. Let’s get some basics down first:

  • NULL is special, it is the absense of a value. Oracle knows that it is special and even gives you the greatest present it could ever give you – a free frequency histogram bucket just for NULLs.
  • If all the columns in an index are valued NULL, there will be no index key for this row.
  • Indexes have their own statistics, but no concept of number of nulls.

So, here’s my set up:

drop table as_magic_indexes;
create table as_magic_indexes 
as 
with ctrl as (select 1 from dual connect by rownum  -- >comment for wordpress formatting
select rownum pk_col, mod(rownum,10) filter_10, rpad('x',400,'x') padding_col
      ,cast(null as number) important_col 
      ,mod(rownum,100) magic_col 
from ctrl o 
cross join ctrl a;
update as_magic_indexes
set    important_col = pk_col 
where  pk_col >= (select max(pk_col) from as_magic_indexes)-20;
commit;

We have a 1 million row table (even if WordPress has cut off some of the script: ctrl has 1000 rows). 21 rows have a non-null value for the important_col column. If we gathered statistics and got an explain plan for a select from the table querying a column and this important_col, we should see that Oracle expects a low cardinality.

exec dbms_stats.gather_table_stats(null,'as_magic_indexes',method_opt=>'for all columns size 1')
select num_rows from user_tables where table_name = 'AS_MAGIC_INDEXES';
  NUM_ROWS
----------
   1000000
select column_name, num_nulls, num_distinct from user_tab_columns where table_name = 'AS_MAGIC_INDEXES';
COLUMN_NAME      NUM_NULLS NUM_DISTINCT
--------------- ---------- ------------
PK_COL                   0      1000000
FILTER_10                0           10
PADDING_COL              0            1
IMPORTANT_COL       999979           21
MAGIC_COL                0          100
explain plan for
select * from as_magic_indexes
where  filter_10 = :x and important_col = :y;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------
Plan hash value: 1274804564
--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |     1 |   413 | 16057   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| AS_MAGIC_INDEXES |     1 |   413 | 16057   (1)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("IMPORTANT_COL"=TO_NUMBER(:Y) AND "FILTER_10"=TO_NUMBER(:X))

1 row in fact, a perfect recipe for an index?

create index as_magic_indexes_idx1 on as_magic_indexes (filter_10,important_col);
select index_name, blevel, leaf_blocks, distinct_keys, avg_leaf_blocks_per_key from user_indexes where index_name = 'AS_MAGIC_INDEXES_IDX1';
INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- -----------------------
AS_MAGIC_INDEXES_IDX1                   2        2078            31                      67
explain plan for
select * from as_magic_indexes
where  filter_10 = :x and important_col = :y;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------
Plan hash value: 1274804564
--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  | 32258 |    12M| 16057   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| AS_MAGIC_INDEXES | 32258 |    12M| 16057   (1)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("IMPORTANT_COL"=TO_NUMBER(:Y) AND "FILTER_10"=TO_NUMBER(:X))

Oh, what happened there? The index statistics cover both our filter columns in one go, this is useful if there is a relationship between the columns (the same can be achieved with a column group statistic). The CBO will trust this grouped statistic more than it trusts combining two column level statistics, but in doing so we’ve completely lost our num NULLs stat. Our new cardinality is given by our num_rows from user_tables divided by distinct_keys from that index we’ve added (1,000,000/31=32258).

What if we could have an index that covered our two filters and keep our free histogram?

drop index as_magic_indexes_idx1;
create index as_magic_indexes_idx2 on as_magic_indexes (filter_10,important_col,magic_col);
select index_name, blevel, leaf_blocks, distinct_keys, avg_leaf_blocks_per_key from user_indexes where index_name = 'AS_MAGIC_INDEXES_IDX2';
INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- -----------------------
AS_MAGIC_INDEXES_IDX2                   2        2492           121                      20
explain plan for
select * from as_magic_indexes
where  filter_10 = :x and important_col = :y;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------
Plan hash value: 3008093446
-------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                       |  8264 |  3333K|  8291   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| AS_MAGIC_INDEXES      |  8264 |  3333K|  8291   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | AS_MAGIC_INDEXES_IDX2 |  8264 |       |    23   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("FILTER_10"=TO_NUMBER(:X) AND "IMPORTANT_COL"=TO_NUMBER(:Y))

All we had to do is add another column to the index, giving it a larger number of distinct keys. I’m adding magic_col because I was planning on turning this into a quiz, but decided a blog post would be better. This gave us an explain plan that uses the index, we still have a large cardinality (it’s still num_rows/distinct_keys), this could trip us up later. Note that if the old index still existed, then it’s statistics will still be in play (and you’d get the full tablescan again).

What if we added a column that was unique instead? This could be a sensible idea because we know that our important_col is unique for it’s not null values.

drop index as_magic_indexes_idx2;
create index as_magic_indexes_idx3 on as_magic_indexes (filter_10,important_col,pk_col);
select index_name, blevel, leaf_blocks, distinct_keys, avg_leaf_blocks_per_key from user_indexes where index_name = 'AS_MAGIC_INDEXES_IDX3';
INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- -----------------------
AS_MAGIC_INDEXES_IDX3                   2        2769       1000000                       1
explain plan for
select * from as_magic_indexes
where  filter_10 = :x and important_col = :y;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------
Plan hash value: 4058028991
-------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                       |     1 |   413 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| AS_MAGIC_INDEXES      |     1 |   413 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | AS_MAGIC_INDEXES_IDX3 |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("FILTER_10"=TO_NUMBER(:X) AND "IMPORTANT_COL"=TO_NUMBER(:Y))

Problem solved? Maybe. With all that manipulation, we can be sure we’ve messed up somehow…
Do we still have accurate cardinality if we filter on important_col is null (which should be a lot of the table)?

explain plan for
select * from as_magic_indexes
where  filter_10 = :x and important_col is null;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 1274804564
--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  | 99998 |    39M| 16056   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| AS_MAGIC_INDEXES | 99998 |    39M| 16056   (1)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FILTER_10"=TO_NUMBER(:X) AND "IMPORTANT_COL" IS NULL)

That seems right to me: 99998 = num_nulls of important col / num_distinct of filter_10. However, let’s not get too confident, just because I can’t think of an immediate problem this has caused, doesn’t mean it’s not possible. Every time I think I’ve done something clever, I probably have messed something else up in a stupid way…

I hope you found this useful, I can imagine this sort of indexing might be helpful in other scenarios too – or at least will give you some imaginative ideas when trying to solve interesting performance problems.

Advertisements