Not all indexes are created equally

The impact of adding an index on the performance of DML is something that comes up every now and then: can it be predicted? Will each added index have the same impact?

To answer this, let’s use our good friend v$active_session_history (usable only if you have diagnostics pack licenced!). Here is a quick and dirty script I use all-the-time to determine where the time is going for a particular sql_id. If we use it with a insert statement we can see which indexes are being accessed and in what ratios. First the script:

get sql_ash.sql

set lines 500  
col "Operation" FOR a35 
break on child_number nodup on "Id" nodup on "Operation" nodup on "Name" nodup on cost nodup on CARDINALITY nodup  
col child_number noprint  
col "Id" for 999  
col event for a30  
col cost for 9999999  
col activity for a10  
col active for a3  
var sql_id varchar2(30 char);  
exec :sql_id := '&1'        
  
  
with ash AS (SELECT sql_id  
      ,sql_plan_hash_value  
      ,SQL_PLAN_LINE_ID  
      ,NVL(event,'CPU') event  
      ,nvl2(event,o.object_name,null) object_name  
      ,COUNT(DISTINCT sample_id) cnt  
      ,ROUND(SUM(ash.delta_read_io_bytes/1024/1024 ),2) Read_MB  
      ,MAX(sample_time) most_recent_time  
FROM   v$active_session_history ash  
LEFT JOIN dba_objects o  
  ON   ash.current_obj# = o.object_id  
WHERE  ash.sql_id = :sql_id  
GROUP BY sql_id  
      ,sql_plan_hash_value  
      ,SQL_PLAN_LINE_ID  
      ,NVL(event,'CPU')  
      ,nvl2(event,o.object_name,null)  
ORDER BY 1,2,3,5)  
 ,sp AS (SELECT sp.plan_hash_value  
               ,id  
               ,sql_id  
               ,LPAD(' ',sp.depth,' ')||sp.operation||' '||sp.options "Operation"  
               ,sp.OBJECT_NAME "Name"  
               ,max(sp.cardinality) cardinality  
               ,MAX(sp.io_cost) cost  
               ,sp.object_name  
        FROM   v$sql_plan sp  
        WHERE  sp.sql_id = :sql_id  
        group by sp.plan_hash_value  
               ,id  
               ,sql_id  
               ,LPAD(' ',sp.depth,' ')||sp.operation||' '||sp.options  
               ,sp.object_name  
       )        
 select active  
       ,PLAN_HASH_VALUE  
       ,"Id"  
       ,"Operation"  
       ,"Name"  
       ,cardinality  
       ,cost  
       ,event  
       ,object_name  
       ,activity  
       ,Read_MB  
 From (  
 select CASE WHEN ash.most_recent_time > sysdate-interval'5'second then '*' else ' ' end active  
      ,sp.plan_hash_value  
      ,sp.id "Id"  
      ,sp."Operation"  
      ,sp.object_name "Name"  
      ,sp.cardinality  
      ,sp.cost  
      ,ash.event  
      ,ash.object_name  
      ,DECODE(ash.cnt,NULL,NULL,ash.cnt||' ('||ROUND(100*RATIO_TO_REPORT(ash.cnt) OVER (PARTITION BY sp.plan_hash_value))||'%)') activity  
      ,ash.cnt  
      ,ash.Read_MB  
 from   sp  
left join   ash  
   on  sp.plan_hash_value = ash.sql_plan_hash_value  
  and sp.id = ash.sql_plan_line_id  
  and sp.sql_id = ash.sql_id  
 order by sp.plan_hash_value,sp.id, ash.cnt  
 )  
 order by plan_hash_value, "Id" , cnt  
/

Note how the query uses COUNT(DISTINCT sample_id) rather than just COUNT(*); this is so that in parallel query environments you associate one second with one second rather than 8 seconds with 1 second and 8 sessions. You may prefer not to do this but it made sense for my usage. You also will likely want to play around with the sqlplus formatting settings I’ve made – they’re not for everyone.

Here’s my setup for the test using all_objects to make our base table, we have three different indexes

  • object_name – many different values
  • owner – few values
create table test_index_perf_b as select ao.* from all_objects ao;
create table test_index_perf_a as select ao.* from test_index_perf_b ao cross join (select rownum rn from dual connect by rownum <=10) d;
create table test_index_perf as select * from test_index_perf_a;
create index test_index_perf_name on test_index_perf (object_name);
create index test_index_perf_owner on test_index_perf (owner);
alter system flush buffer_cache;
set serverout off
insert into test_index_perf select * from test_index_perf_a;
select * from table(dbms_xplan.display_cursor);
commit;

Note how I cheekily called dbms_xplan.display_cursor because it will give me the sql_id of the last statement executed (because I turned serverout off, otherwise the last executed statement would be the fetch from dbms_output buffer!). Another point is that I flushed the buffer cache before running this, otherwise the index was still completely cached and all my hits were CPU – and I’m not sure whether I can trust the object name if the event is NULL (CPU) so I don’t.

I then run

@Sql_ash <sql_id>

The output (with removed columns while I learn how to do this nicely in wordpress):

  Id Operation                 Name               EVENT                          OBJECT_NAME                    ACTIVITY
---- ------------------------- ------------------ ------------------------------ ------------------------------ ----------
   0 INSERT STATEMENT                             control file parallel write    TEST_INDEX_PERF_NAME           1 (0%)
                                                  Disk file operations I/O       TEST_INDEX_PERF_NAME           1 (0%)
                                                  control file parallel write    TEST_INDEX_PERF_A              2 (1%)
                                                  rdbms ipc reply                TEST_INDEX_PERF_A              3 (1%)
                                                  Data file init write           TEST_INDEX_PERF_NAME           4 (2%)
                                                  CPU                                                           6 (2%)
                                                  Disk file operations I/O       TEST_INDEX_PERF_A              23 (9%)
                                                  Data file init write           TEST_INDEX_PERF_A              25 (10%)
                                                  control file sequential read   TEST_INDEX_PERF_A              37 (15%)
   1  LOAD TABLE CONVENTIONAL  TEST_INDEX_PERF    db file sequential read        TEST_INDEX_PERF_OWNER          3 (1%)
                                                  log buffer space               TEST_INDEX_PERF_A              12 (5%)
                                                  db file sequential read        TEST_INDEX_PERF_NAME           43 (17%)
                                                  CPU                                                           90 (36%)
   2   TABLE ACCESS FULL       TEST_INDEX_PERF_A  direct path read               TEST_INDEX_PERF_A              2 (1%)

So we can clearly see, not all indexes are created equally when it comes to DML performance.  If we think about what actually happens to maintain the index during the DML operation, it sort of makes sense; each insert must go and find (and update) a block in the index which would point to this row. If we are inserting few different values for a column then rather than going all over the index fairly randomly, we are just going to the same blocks over and over again – benefitting from our buffer cache. If the values are all over the place then so will our reads against the index, meaning we must load even more from disk.

Another thing to note is that these results are from my home setup – my disk is an SD card and my CPU is busy doing non-Oracle things too. No other queries are running so nothing else competes for the cache space. In a real life system, you’d probably see much less CPU time, much less time associated with file management (as these are new segments), less redo waits etc.

The real question you must ask is “Is the overhead acceptable for the improvement in query performance?” If you’re going to hardly ever use the index then perhaps it’s not worth it, if your new business critical process can really take advantage of the index then perhaps you have no choice?

Something to consider, what if your inserts used mostly (or always) the same value for one column in each batch? Could you add that column to the left side of your indexes to improve DML performance? Could that additional column also help the performance of your queries that already use this index? This is something I see commonly in a DW environment with fact tables that have many dimensions of a hierarchy. Consider a table of tasks that you associate with a user, consider that a user also belongs to a team which is also stored in the same table. If we were to just index on the user id and we load a batch of inserts for this team, our index is going to be hit all over. If we have team id as a leading column in our index then our index hit is going to be narrower. If our queries filter on team id with a few user ids then our index will even improve this performance because we are hitting a narrow range in our index too: Win-Win.

Andrew

Advertisements