Mostly NULL column in a composite index

This is just a fun one to show something special that can happen when you allow NULL values in your indexes. Let’s get some basics down first:

  • NULL is special, it is the absense of a value. Oracle knows that it is special and even gives you the greatest present it could ever give you – a free frequency histogram bucket just for NULLs.
  • If all the columns in an index are valued NULL, there will be no index key for this row.
  • Indexes have their own statistics, but no concept of number of nulls.

So, here’s my set up:

drop table as_magic_indexes;
create table as_magic_indexes 
as 
with ctrl as (select 1 from dual connect by rownum  -- >comment for wordpress formatting
select rownum pk_col, mod(rownum,10) filter_10, rpad('x',400,'x') padding_col
      ,cast(null as number) important_col 
      ,mod(rownum,100) magic_col 
from ctrl o 
cross join ctrl a;
update as_magic_indexes
set    important_col = pk_col 
where  pk_col >= (select max(pk_col) from as_magic_indexes)-20;
commit;

We have a 1 million row table (even if WordPress has cut off some of the script: ctrl has 1000 rows). 21 rows have a non-null value for the important_col column. If we gathered statistics and got an explain plan for a select from the table querying a column and this important_col, we should see that Oracle expects a low cardinality.

exec dbms_stats.gather_table_stats(null,'as_magic_indexes',method_opt=>'for all columns size 1')
select num_rows from user_tables where table_name = 'AS_MAGIC_INDEXES';
  NUM_ROWS
----------
   1000000
select column_name, num_nulls, num_distinct from user_tab_columns where table_name = 'AS_MAGIC_INDEXES';
COLUMN_NAME      NUM_NULLS NUM_DISTINCT
--------------- ---------- ------------
PK_COL                   0      1000000
FILTER_10                0           10
PADDING_COL              0            1
IMPORTANT_COL       999979           21
MAGIC_COL                0          100
explain plan for
select * from as_magic_indexes
where  filter_10 = :x and important_col = :y;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------
Plan hash value: 1274804564
--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |     1 |   413 | 16057   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| AS_MAGIC_INDEXES |     1 |   413 | 16057   (1)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("IMPORTANT_COL"=TO_NUMBER(:Y) AND "FILTER_10"=TO_NUMBER(:X))

1 row in fact, a perfect recipe for an index?

create index as_magic_indexes_idx1 on as_magic_indexes (filter_10,important_col);
select index_name, blevel, leaf_blocks, distinct_keys, avg_leaf_blocks_per_key from user_indexes where index_name = 'AS_MAGIC_INDEXES_IDX1';
INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- -----------------------
AS_MAGIC_INDEXES_IDX1                   2        2078            31                      67
explain plan for
select * from as_magic_indexes
where  filter_10 = :x and important_col = :y;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------
Plan hash value: 1274804564
--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  | 32258 |    12M| 16057   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| AS_MAGIC_INDEXES | 32258 |    12M| 16057   (1)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("IMPORTANT_COL"=TO_NUMBER(:Y) AND "FILTER_10"=TO_NUMBER(:X))

Oh, what happened there? The index statistics cover both our filter columns in one go, this is useful if there is a relationship between the columns (the same can be achieved with a column group statistic). The CBO will trust this grouped statistic more than it trusts combining two column level statistics, but in doing so we’ve completely lost our num NULLs stat. Our new cardinality is given by our num_rows from user_tables divided by distinct_keys from that index we’ve added (1,000,000/31=32258).

What if we could have an index that covered our two filters and keep our free histogram?

drop index as_magic_indexes_idx1;
create index as_magic_indexes_idx2 on as_magic_indexes (filter_10,important_col,magic_col);
select index_name, blevel, leaf_blocks, distinct_keys, avg_leaf_blocks_per_key from user_indexes where index_name = 'AS_MAGIC_INDEXES_IDX2';
INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- -----------------------
AS_MAGIC_INDEXES_IDX2                   2        2492           121                      20
explain plan for
select * from as_magic_indexes
where  filter_10 = :x and important_col = :y;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------
Plan hash value: 3008093446
-------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                       |  8264 |  3333K|  8291   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| AS_MAGIC_INDEXES      |  8264 |  3333K|  8291   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | AS_MAGIC_INDEXES_IDX2 |  8264 |       |    23   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("FILTER_10"=TO_NUMBER(:X) AND "IMPORTANT_COL"=TO_NUMBER(:Y))

All we had to do is add another column to the index, giving it a larger number of distinct keys. I’m adding magic_col because I was planning on turning this into a quiz, but decided a blog post would be better. This gave us an explain plan that uses the index, we still have a large cardinality (it’s still num_rows/distinct_keys), this could trip us up later. Note that if the old index still existed, then it’s statistics will still be in play (and you’d get the full tablescan again).

What if we added a column that was unique instead? This could be a sensible idea because we know that our important_col is unique for it’s not null values.

drop index as_magic_indexes_idx2;
create index as_magic_indexes_idx3 on as_magic_indexes (filter_10,important_col,pk_col);
select index_name, blevel, leaf_blocks, distinct_keys, avg_leaf_blocks_per_key from user_indexes where index_name = 'AS_MAGIC_INDEXES_IDX3';
INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- -----------------------
AS_MAGIC_INDEXES_IDX3                   2        2769       1000000                       1
explain plan for
select * from as_magic_indexes
where  filter_10 = :x and important_col = :y;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------
Plan hash value: 4058028991
-------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                       |     1 |   413 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| AS_MAGIC_INDEXES      |     1 |   413 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | AS_MAGIC_INDEXES_IDX3 |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("FILTER_10"=TO_NUMBER(:X) AND "IMPORTANT_COL"=TO_NUMBER(:Y))

Problem solved? Maybe. With all that manipulation, we can be sure we’ve messed up somehow…
Do we still have accurate cardinality if we filter on important_col is null (which should be a lot of the table)?

explain plan for
select * from as_magic_indexes
where  filter_10 = :x and important_col is null;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 1274804564
--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  | 99998 |    39M| 16056   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| AS_MAGIC_INDEXES | 99998 |    39M| 16056   (1)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FILTER_10"=TO_NUMBER(:X) AND "IMPORTANT_COL" IS NULL)

That seems right to me: 99998 = num_nulls of important col / num_distinct of filter_10. However, let’s not get too confident, just because I can’t think of an immediate problem this has caused, doesn’t mean it’s not possible. Every time I think I’ve done something clever, I probably have messed something else up in a stupid way…

I hope you found this useful, I can imagine this sort of indexing might be helpful in other scenarios too – or at least will give you some imaginative ideas when trying to solve interesting performance problems.

Advertisements

Visualising ASH

I have a directory where I try to keep all the queries I write for diagnosing performance issues, these have evolved over time to help identify different problems. These scripts typically use v$active_session_history in some way. I wanted to share some of them with  my colleagues, but found they aren’t so confident with AWR and find it difficult to reach conclusions with so much information.

I took a step back and tried to make some accessible reports in SQL Developer that hopefully anyone can run (they do need to have the diagnostics pack licence though). I’m offering these reports to the community, hopefully you can use them in your own work, possibly make your own enhancements to them (feel free to share even further). Yes, there are already built-in ASH reports in SQL Developer, but I don’t feel they’re done at the same level I would want them. Plus, DIY is always more rewarding.

Currently there are two main reports, one that shows you what’s been sampled in ASH in the last 10 minutes. It allows you to drill into a module, action, event combination to see what SQL it was executing and for how long. Just click on the bar you find interesting. I expect you might want to modify how it groups (perhaps schema, client_identifier or program etc is more appropriate for your use)quick_ash_charts

The second report is an adaption of my sql_ash.sql script that I shared in Not all indexes are created equally. In order to run the report as-is you will need to be on at least 12.1.0.1, you can modify the SQL to remove the concept of full_plan_hash_value if you are on a version below this If you aren’t on 12c yet then run the pre 12c report, no features are missing, the 12c version needed modifying due to adaptive plans. The gauge will show you which part of the plan is getting most time allocated to it. Unfortunately, this screenshot is from my sandbox system so it was not a very interesting piece of SQL I’m looking at here.

quick_ash_sql_ash

–I couldn’t figure out how to call a child report from a child report.  If you do know, then please let me know 🙂 For now, you would just copy and paste the sql_id value.

In order to set the reports set up, you’ll need to copy the below xml into an xml file and import it to your SQL Developer – Right-click on User Defined Reports and choose the Open Report… option.

open_reports.PNG

Get the XML report here quick_ash_queries_report_export.xml

GIF of the whole process:

visualize_ash_fast

Hope you find this useful.

-Edit 25/11/2017
I’ve made some adjustments and also included a pre 12c version of the SQL ASH report on the request of a comment. There’s also a gif to explain the install and running of the reports.

How many rows were inserted/updated/deleted in my merge?

Before we get started, I need to make one thing very clear.

Do not attempt this unless you have paid for the tuning pack licence!

I’ve seen this asked a few times and have always thought it was just unachievable, but I discovered this by accident today so thought I’d share.

First, let’s have some test data. I’m super lazy so let’s get QuickSQL to do the work for us, we go to quickSQL, and enter the following as our data model

# pk: "identify"
employees_trg /insert 5
 name /nn vc50
 email /lower
 cost center num
 date hired
 job
employees_mrg /insert 10
 name /nn vc50
 email /lower
 cost center num
 date hired
 job

(Okay, I admit I copied one of the sample ones and made some tweaks and took out the triggers from the generated code, I’m that lazy)

I won’t include the full script it came out with, you can run it yourself and see. But here is the data:

select * from employees_trg;

  ID NAME               EMAIL                        COST_CENTER DATE_HIRED          JOB
---- ------------------ ---------------------------- ----------- ------------------- ------------------------
   1 Gricelda Luebbers  gricelda.luebbers@ab4z.com            17 13/10/2017 22:46:33 Executive Engineer
   2 Dean Bollich       dean.bollich@ab40.com                  7 14/07/2017 22:46:33 Data Architect
   3 Milo Manoni        milo.manoni@ab41.com                  26 19/08/2017 22:46:33 Marketing Associate
   4 Laurice Karl       laurice.karl@ab42.com                  3 13/08/2017 22:46:34 Customer Advocate
   5 August Rupel       august.rupel@ab43.com                 75 17/08/2017 22:46:34 Web Developer

select * from employees_mrg;

  ID NAME                EMAIL                        COST_CENTER DATE_HIRED          JOB
---- ------------------- ---------------------------- ----------- ------------------- ---------------------------
   1 Gricelda Luebbers   gricelda.luebbers@ab4z.com            62 05/08/2017 22:46:34 Solustions Specialist
   2 Dean Bollich        dean.bollich@ab40.com                 37 08/08/2017 22:46:34 Help Desk Specialist
   3 Milo Manoni         milo.manoni@ab41.com                  52 13/07/2017 22:46:34 Application Developer
   4 Laurice Karl        laurice.karl@ab42.com                 94 20/07/2017 22:46:34 Quality Control Specialist
   5 August Rupel        august.rupel@ab43.com                  5 20/08/2017 22:46:35 Analyst
   6 Salome Guisti       salome.guisti@ab44.com                18 28/07/2017 22:46:35 Network Architect
   7 Lovie Ritacco       lovie.ritacco@ab45.com                21 06/09/2017 22:46:35 Programmer Analyst
   8 Chaya Greczkowski   chaya.greczkowski@ab46.com            36 26/07/2017 22:46:35 Application Developer
   9 Twila Coolbeth      twila.coolbeth@ab47.com               81 24/08/2017 22:46:35 Sustaining Engineering
  10 Carlotta Achenbach  carlotta.achenbach@ab48.com           11 12/10/2017 22:46:35 Support Specialist

We have two employees tables, one target, one merge. Now let’s write out the merge, we want to include inserts/updates/deletes all with filters (to show off). I’ve also added some extra information so we can double check our numbers.

As you may have guessed by the bold statement at the top, we need to monitor the execution of this SQL. Easy enough if we add a /+monitor/ hint, otherwise it will need to take at least 5 seconds or run in parallel to be automatically monitored.

merge /*+monitor*/ into employees_trg t
using employees_mrg m
on (t.id = m.id)
when matched then update
set t.name = m.name
   ,t.job  = 'Updated'
where  length(t.email) <= length(m.email)
delete where t.date_hired = sysdate-50;

7 rows merged.

select * from EMPLOYEES_TRG;

   ID NAME                EMAIL                        COST_CENTER DATE_HIRED          JOB
----- ------------------- ---------------------------- ----------- ------------------- -----------
    1 Gricelda Luebbers   gricelda.luebbers@ab4z.com            17 13/10/2017 23:09:34 Updated
    3 Milo Manoni         milo.manoni@ab41.com                  26 19/08/2017 23:09:34 Updated
    4 Laurice Karl        laurice.karl@ab42.com                  3 13/08/2017 23:09:34 Updated
   27 Lovie Ritacco       lovie.ritacco@ab45.com                21 06/09/2017 23:09:35 Inserted
   30 Carlotta Achenbach  carlotta.achenbach@ab48.com           11 12/10/2017 23:09:36 Inserted

So we have 3 updated rows, 2 inserted rows, and we can see we’re missing id 2 and 5 so they must have been deleted.

And let’s check out the SQL Live Monitor report. First we get the SQL_ID for our statement, we can use the sql_find script that I shared in 4 EASY lessons to enhance your performance diagnostics

@sql_find "merge /*+monitor*/ into employees_trg"
SQL_ID        LAST_ACTIVE_TIME    EXECUTIONS PARSE_CALLS    FETCHES ROWS_PROCESSED
------------- ------------------- ---------- ----------- ---------- --------------
SQL_TEXT
-----------------------------------------------------------------------------------------------------
2qcnzh2cqfugz 19/10/2017 23:10:00          1           1          0              7
merge /*+monitor*/ into employees_trg t using employees_mrg m on (t.id = m.id) when matched then upd

Now we can run this sql_id through the dbms_sql_monitor report_sql_monitor function.

set lines 250
col report for a250
set long 2000000000
set longch 250
select dbms_sql_monitor.report_sql_monitor
(sql_id=> '2qcnzh2cqfugz'
,type=>'TEXT'
,report_level=>'all'
) report
from dual;
Global Information
------------------------------
 Status              :  DONE
 Instance ID         :  1
 Session             :  ANDY (259:56483)
 SQL ID              :  2qcnzh2cqfugz
 SQL Execution ID    :  16777216
 Execution Started   :  10/19/2017 23:10:00
 First Refresh Time  :  10/19/2017 23:10:00
 Last Refresh Time   :  10/19/2017 23:10:00
 Duration            :  .018252s
 Module/Action       :  SQL*Plus/-
 Service             :  pdb1.andrew
 Program             :  sqlplus.exe

Global Stats
========================================================
| Elapsed |   Cpu   |    IO    | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Gets  | Reqs | Bytes |
========================================================
|    0.02 |    0.02 |     0.00 |     89 |    1 |  8192 |
========================================================

SQL Plan Monitoring Details (Plan Hash Value=2280051147)
=================================================================================================================================================
| Id |         Operation          |     Name      |  Rows   | Cost |   Time    | Start  | Execs |   Rows   |  Mem  | Activity | Activity Detail |
|    |                            |               | (Estim) |      | Active(s) | Active |       | (Actual) | (Max) |   (%)    |   (# samples)   |
=================================================================================================================================================
|  0 | MERGE STATEMENT            |               |         |      |         1 |     +0 |     1 |        2 |     . |          |                 |
|  1 |   MERGE                    | EMPLOYEES_TRG |         |      |         1 |     +0 |     1 |        2 |     . |          |                 |
|  2 |    VIEW                    |               |         |      |         1 |     +0 |     1 |       10 |     . |          |                 |
|  3 |     SEQUENCE               | ISEQ$$_80273  |         |      |         1 |     +0 |     1 |       10 |     . |          |                 |
|  4 |      HASH JOIN RIGHT OUTER |               |      10 |    6 |         1 |     +0 |     1 |       10 | 723KB |          |                 |
|  5 |       TABLE ACCESS FULL    | EMPLOYEES_TRG |       5 |    3 |         1 |     +0 |     1 |        5 |     . |          |                 |
|  6 |       TABLE ACCESS FULL    | EMPLOYEES_MRG |      10 |    3 |         1 |     +0 |     1 |       10 |     . |          |                 |
=================================================================================================================================================

We can’t really tell much from this about what DML was really done, we can see the 2 rows on line 1, but it’s not obvious what that could refer to.

I wrote the following query when I discovered that hidden in the depths of v$sql_plan_monitor were some pivoted out columns that can be used to determine why DOP downgrade was experienced. This was something I knew was possible in 12c thanks to a few blog posts mentioning it in the Active/Flash version of the monitor report, but I hadn’t figured out how to get the information from the views until now.
This is the generic query:

var sql_id varchar2(100)
exec :sql_id := '2qcnzh2cqfugz'

col value_text for a53
prompt Reducing length of column as we won't be using it here.
col value_text for a10
col description for a100

select up.plan_line_id
      ,sms.name
      ,up.value
      ,dop_down.value_text
      ,sms.description
from   (
select * from (
select sql_id
      ,plan_line_id
      ,otherstat_1_id
      ,otherstat_1_type
      ,otherstat_1_value
      ,otherstat_2_id
      ,otherstat_2_type
      ,otherstat_2_value
      ,otherstat_3_id
      ,otherstat_3_type
      ,otherstat_3_value
      ,otherstat_4_id
      ,otherstat_4_type
      ,otherstat_4_value
      ,otherstat_5_id
      ,otherstat_5_type
      ,otherstat_5_value
      ,otherstat_6_id
      ,otherstat_6_type
      ,otherstat_6_value
      ,otherstat_7_id
      ,otherstat_7_type
      ,otherstat_7_value
      ,otherstat_8_id
      ,otherstat_8_type
      ,otherstat_8_value
      ,otherstat_9_id
      ,otherstat_9_type
      ,otherstat_9_value
      ,otherstat_10_id
      ,otherstat_10_type
      ,otherstat_10_value
from v$sql_plan_monitor spm
where spm.sql_id = :sql_id
and   spm.sql_exec_id = (select max(sql_exec_id) from v$sql_monitor sm where sm.sql_id = :sql_id) -- this will identify the same execution that the report was using
)
unpivot ((id,type,value)
for pivId in ((otherstat_1_id ,otherstat_1_type ,otherstat_1_value ) as 1
             ,(otherstat_2_id ,otherstat_2_type ,otherstat_2_value ) as 2
             ,(otherstat_3_id ,otherstat_3_type ,otherstat_3_value ) as 3
             ,(otherstat_4_id ,otherstat_4_type ,otherstat_4_value ) as 4
             ,(otherstat_5_id ,otherstat_5_type ,otherstat_5_value ) as 5
             ,(otherstat_6_id ,otherstat_6_type ,otherstat_6_value ) as 6
             ,(otherstat_7_id ,otherstat_7_type ,otherstat_7_value ) as 7
             ,(otherstat_8_id ,otherstat_8_type ,otherstat_8_value ) as 8
             ,(otherstat_9_id ,otherstat_9_type ,otherstat_9_value ) as 9
             ,(otherstat_10_id,otherstat_10_type,otherstat_10_value) as 10
             )
       )
) up
left join v$sql_monitor_statname sms
  on up.id = sms.id
left join (select 'downgrade reason' name, 350 value , 'DOP downgrade due to adaptive DOP'                     value_text from dual union all
           select 'downgrade reason' name, 351 value , 'DOP downgrade due to resource manager max DOP'         value_text from dual union all
           select 'downgrade reason' name, 352 value , 'DOP downgrade due to insufficient number of processes' value_text from dual union all
           select 'downgrade reason' name, 353 value , 'DOP downgrade because slaves failed to join'           value_text from dual
          ) dop_down
  on sms.name = dop_down.name
 and up.value = dop_down.value
order by 1,2
/

PLAN_LINE_ID NAME                                VALUE VALUE_TEXT DESCRIPTION
------------ ------------------------------ ---------- ---------- -----------------------------------------------------------------
           1 MERGE deleted row count                 2            count of rows deleted by MERGE
           1 MERGE inserted row count                2            count of rows inserted by MERGE
           1 MERGE updated row count                 3            count of rows updated by MERGE
           4 Build Row Count                         5            Number of rows for the build
           4 Build Size                         491520            Size of the build input in bytes
           4 Fan-out                                 8            Number of partitions used to split both inputs
           4 Slot Size                          122880            Size of an in-memory hash-join slot
           4 Total Build Partitions                  8            Total number of build partitions
           4 Total Cached Partitions                 8            Total number of build partitions left in-memory before probing

We can clearly see that we inserted 2 rows, updated 3 and deleted 2, exactly what our data suggested.

On top of this, we have some cool stats that might be interesting if we were looking into what was going on in line 4 (the hash join).

Be sure to take a look at the other available stats in v$sql_monitor_statname. I’d love to see some examples of these.

Slow dbms_parallel_execute.create_chunks_by_number_col ?

I have to apologise for the lack of content in the past few months, I’ve been stretching myself a little too thin. Some of you may know I’m trying to sell my apartment and move to the West Midlands, that’s turned out to be very stressful too! Anyway, I’ve found some time for this quick post so let’s get on with it.

I was using the dbms_parallel_execute package today to chunk up a reconciliation job (comparing the existence of primary key values between two tables in a DW). The only way I could think of comparing the columns without running out of temp (the tables are Tbs in size) was to batch up the comparison with value ranges. The easy way to chunk up my number column was using this dbms_parallel_execute procedure, but I ran into a performance problem pretty quickly.

Here’s a demo, first the setup:

create table as_dpe_number
(pk_col number primary key
,data_col varchar2(100)
);
create sequence as_dpe_number_seq;
declare
 sString varchar2(100) := rpad('x',100,'x');
begin
 for i in 1..100
 loop
 insert into as_dpe_number
 select as_dpe_number_seq.nextval, sString from dual connect by rownum <=10000;
 commit;
 end loop;
end;
/
exec dbms_parallel_execute.create_task('AS_TEST_NUMBER')

I have a table with 1,000,000 rows (which isn’t that much, but this is a demo), and I have setup my parallel execute task. Now I call the chunking procedure:

set timing on
ANDY@pdb1>exec dbms_parallel_execute.create_chunks_by_number_col('AS_TEST_NUMBER',user,'AS_DPE_NUMBER','PK_COL',1000)

PL/SQL procedure successfully completed.

Elapsed: 00:00:05.47

So that took 5.47 seconds to split my 1,000,000 rows into 1,000 batches of 1,000 rows. What’s the harm? Well if we look at where that 5 seconds went, using a quick n’ dirty query against v$active_session_history (and you must have paid for the diagnostics pack licence if you want to use it)

select ash.sql_id, substr(s.sql_text,1,52) sql_text, count(*)
from v$active_session_history ash
left join v$sql s
 on ash.sql_id = s.sql_id
 and ash.sql_child_number = s.child_number
where ash.session_id = sys_context('userenv','sid')
and ash.sample_time >= sysdate-interval'10'second
group by ash.sql_id, substr(s.sql_text,1,52)
order by count(*);
SQL_ID         SQL_TEXT                                             COUNT(*)
------------- ----------------------------------------------------- ----------
83b0ca0j15319 INSERT INTO DBMS_PARALLEL_EXECUTE_CHUNKS$( CHUNK_ID,          1
8mdxrp0r8dhx2 select min("PK_COL"), max("PK_COL")                           4

You might automatically recognize the problem immediately if you’ve run into it before. But if we were to look at the plan for that SQL that’s taking the time:

ANDY@pdb1>select * from dbms_xplan.display_cursor(sql_id=>'8mdxrp0r8dhx2');

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
SQL_ID 8mdxrp0r8dhx2, child number 0
-------------------------------------
select min("PK_COL"), max("PK_COL") from
"ANDY"."AS_DPE_NUMBER"

Plan hash value: 2623793653

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 546 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | INDEX FAST FULL SCAN| SYS_C0013861 | 860K| 10M| 546 (1)| 00:00:01 |
--------------------------------------------------------------------------------------

Note
-----
 - dynamic statistics used: dynamic sampling (level=2)

We’re reading the entire index even though we only need to know the first and last values,  this is because the min and the max are being attempted in the same subquery. You can appreciate how long that might take with a table that’s 1 Tb! Let’s see what happens if we do

select (select min(pk_col) from as_dpe_number) , (select max(pk_col) from as_dpe_number) from dual;

Instead? Well, we get a plan like:

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------
Plan hash value: 298820326

-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 6 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | INDEX FULL SCAN (MIN/MAX)| SYS_C0013861 | 1 | 13 | 2 (0)| 00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 13 | | |
| 4 | INDEX FULL SCAN (MIN/MAX)| SYS_C0013861 | 1 | 13 | 2 (0)| 00:00:01 |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Note
-----
 - dynamic statistics used: dynamic sampling (level=2)

Those MIN/MAX scans mean that we are jumping straight to the beginning and straight to the end of the index, easy!

So how do we get dbms_parallel_execute to use this? We can’t change the SQL it runs, but we can change the table it runs against:

exec dbms_parallel_execute.drop_task('AS_TEST_NUMBER')
create table as_dpe_number_minmax2
as
select min(pk_col) pk_col from as_dpe_number
union all
select max(pk_col) from as_dpe_number
/

Table created.

Elapsed: 00:00:00.05

exec dbms_parallel_execute.create_task('AS_TEST_NUMBER')
exec dbms_parallel_execute.create_chunks_by_number_col('AS_TEST_NUMBER',user,'AS_DPE_NUMBER_MINMAX','PK_COL',1000)

Elapsed: 00:00:00.85

I should note that this only works because instead of the package going through all the values in your table, it just goes to the minimum and maximum values and creates uniform width chunks. I.e. if you have values 1, 5,6,7,8,9,10 and wanted two chunks, you’d get a chunk with 1-5 and a much more dense chunk of 6 to 10.

Hopefully this comes in handy when you need it, as is a nice reminder about the problem with select min & max queries.

 

Indexing Nulls by including constants in the index? Careful now

This is just a quick demo to show how sometimes Oracle can get a bit too keen on replacing expressions with virtual columns that are declared. The problem you might have is one that was seen on the OTN recently https://community.oracle.com/message/14281753#14281753; the basic scenario is you have a nullable column that you wish to index  the NULL values for. The problem with indexing nulls is that there’ll be no entry in an index if ALL columns of the index are NULL so a typical workaround is to introduce another column to the index that won’t be NULL, an easy way of doing this is to use a fixed value. Of course, you would normally choose a small value so that your index wasn’t too large – in the past I’ve used ‘x’. The OP of this thread chose -999999 which is rather large (byte-wise) but shouldn’t strike you as being particular bad.

Here’s the problem, if you then use this value as a filter on the table, Oracle can (and will) do some transformation to replace your value with the hidden virtual column (which it created to support the function based index).

A demo using 12.2.0.1:

ANDY@pdb1>create table careful_of_workarounds
  2  as
  3  select * from all_objects;

Table created.

ANDY@pdb1>alter table careful_of_workarounds modify object_id null;

Table altered.

ANDY@pdb1>create index careful_of_workarounds_idx on careful_of_workarounds (object_id);

Index created.

ANDY@pdb1>select min(object_id), max(object_id) from CAREFUL_OF_WORKAROUNDS;

MIN(OBJECT_ID) MAX(OBJECT_ID)                                                                                                                                                                           
-------------- --------------                                                                                                                                                                           
             2          73632                                                                                                                                                                          

ANDY@pdb1>explain plan for select * from careful_of_workarounds where object_id = 987654321;

Explained.
ANDY@pdb1>select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT                                                                                                                                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1786522280                                                                                                                                                                             
                                                                                                                                                                                                        
------------------------------------------------------------------------------------------------------------------                                                                                      
| Id  | Operation                           | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |                                                                                      
------------------------------------------------------------------------------------------------------------------                                                                                      
|   0 | SELECT STATEMENT                    |                            |     1 |   135 |     2   (0)| 00:00:01 |                                                                                      
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| CAREFUL_OF_WORKAROUNDS     |     1 |   135 |     2   (0)| 00:00:01 |                                                                                      
|*  2 |   INDEX RANGE SCAN                  | CAREFUL_OF_WORKAROUNDS_IDX |     1 |       |     1   (0)| 00:00:01 |                                                                                      
------------------------------------------------------------------------------------------------------------------                                                                                      
                                                                                                                                                                                                        
Predicate Information (identified by operation id):                                                                                                                                                     
---------------------------------------------------                                                                                                                                                     
                                                                                                                                                                                                        
   2 - access("OBJECT_ID"=987654321)                                                                                                                                                                   

14 rows selected.

ANDY@pdb1>create index careful_of_workarounds_eg on careful_of_workarounds (object_id,987654321);

Index created.

ANDY@pdb1>explain plan for select * from careful_of_workarounds where object_id = 987654321;

Explained.
ANDY@pdb1>select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT                                                                                                                                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1786522280                                                                                                                                                                             
                                                                                                                                                                                                        
------------------------------------------------------------------------------------------------------------------                                                                                      
| Id  | Operation                           | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |                                                                                      
------------------------------------------------------------------------------------------------------------------                                                                                      
|   0 | SELECT STATEMENT                    |                            |     1 |   135 |     2   (0)| 00:00:01 |                                                                                      
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| CAREFUL_OF_WORKAROUNDS     |     1 |   135 |     2   (0)| 00:00:01 |                                                                                      
|*  2 |   INDEX RANGE SCAN                  | CAREFUL_OF_WORKAROUNDS_IDX |     1 |       |     1   (0)| 00:00:01 |                                                                                      
------------------------------------------------------------------------------------------------------------------                                                                                      
                                                                                                                                                                                                        
Predicate Information (identified by operation id):                                                                                                                                                     
---------------------------------------------------                                                                                                                                                     
                                                                                                                                                                                                        
   2 - access("OBJECT_ID"=987654321)                                                                                                                                                                   

14 rows selected.

ANDY@pdb1>create index careful_of_workarounds_eg2 on careful_of_workarounds (object_id,-987654321);

Index created.

ANDY@pdb1>explain plan for select * from careful_of_workarounds where object_id = -987654321;

Explained.
ANDY@pdb1>select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT                                                                                                                                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2863211044                                                                                                                                                                             
                                                                                                                                                                                                        
--------------------------------------------------------------------------------------------                                                                                                            
| Id  | Operation         | Name                   | Rows  | Bytes | Cost (%CPU)| Time     |                                                                                                            
--------------------------------------------------------------------------------------------                                                                                                            
|   0 | SELECT STATEMENT  |                        |     1 |   135 |   375   (1)| 00:00:01 |                                                                                                            
|*  1 |  TABLE ACCESS FULL| CAREFUL_OF_WORKAROUNDS |     1 |   135 |   375   (1)| 00:00:01 |                                                                                                            
--------------------------------------------------------------------------------------------                                                                                                            
                                                                                                                                                                                                        
Predicate Information (identified by operation id):                                                                                                                                                     
---------------------------------------------------                                                                                                                                                     
                                                                                                                                                                                                        
   1 - filter("OBJECT_ID"=(-987654321))                                                                                                                                                                

13 rows selected.
ANDY@pdb1>explain plan for select /*+index(careful_of_workarounds CAREFUL_OF_WORKAROUNDS_IDX)*/* from careful_of_workarounds where object_id = -987654321;

Explained.

ANDY@pdb1>select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2312772364

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                            |     1 |   135 |   154   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| CAREFUL_OF_WORKAROUNDS     |     1 |   135 |   154   (1)| 00:00:01 |
|*  2 |   INDEX FULL SCAN                   | CAREFUL_OF_WORKAROUNDS_IDX |     1 |       |   153   (1)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OBJECT_ID"=(-987654321))
See how we are now forced to use our predicate as a filter rather than an access predicate on the index – causing us to have to read the ENTIRITY of the index. This seems to only be the case with the negative valued expression (using -1 did exactly the same), but not for positive. A brief check against dba_tab_cols might shed some hints:
ANDY@pdb1>select column_name, data_default, column_id from dba_tab_cols where table_name ='CAREFUL_OF_WORKAROUNDS' and data_default is not null
  2  /

COLUMN_NAME                    DATA_DEFAULT                                        COLUMN_ID
------------------------------ -------------------------------------------------- ----------
SYS_NC00027$                   987654321
SYS_NC00028$                   (-987654321)
SYS_NC00029$                   (-1)
The positive valued virtual column has no brackets (or parenthesise) surrounding it but the negatively valued ones do – perhaps this is what would class the negatives as a replaceable expression rather than a fixed value.
Okay, that might not be so bad because it’s just one column right? Well…
ANDY@pdb1>create index care_of_workarounds_really on careful_of_workarounds (created_vsnid);

Index created.

ANDY@pdb1>explain plan for select /*+index(careful_of_workarounds care_of_workarounds_really)*/* from careful_of_workarounds where created_vsnid = -987654321;

Explained.

ANDY@pdb1>select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2532905546

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                            |     1 |   135 |     0   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| CAREFUL_OF_WORKAROUNDS     |     1 |   135 |     0   (0)| 00:00:01 |
|*  2 |   INDEX FULL SCAN                   | CARE_OF_WORKAROUNDS_REALLY |     1 |       |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("CREATED_VSNID"=(-987654321))
Oh dear!
Are you safe if you use binds though?
ANDY@pdb1>var object_id number=-987654321
ANDY@pdb1>select /*+index(careful_of_workarounds careful_of_workarounds_idx)*/* from careful_of_workarounds where object_id = :object_id;

no rows selected

ANDY@pdb1>select * from table(dbms_xplan.display_cursor(format=>'typical'))
  2  /

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  9cjjrakjnam15, child number 0
-------------------------------------
select /*+index(careful_of_workarounds careful_of_workarounds_idx)*/*
from careful_of_workarounds where object_id = :object_id

Plan hash value: 1786522280

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                            |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| CAREFUL_OF_WORKAROUNDS     |     1 |   135 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | CAREFUL_OF_WORKAROUNDS_IDX |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=:OBJECT_ID)
Yes! You may have guessed this but it’s always better to verify. Also, I should clarify that using ‘x’ doesn’t fall into the same problem

SQL*Plus 12.2 (so close)

There’s a few new features in the latest SQL*Plus client that look like they could be super useful, I haven’t seen much mentioning of them so I thought I’d take up the job. The documentation has the changes in https://docs.oracle.com/database/122/SQPUG/release-changes.htm#SQPUG648 so review the docs for the facts.

HISTORY

Syntax
HIST[ORY] [n RUN | EDIT | DEL[ETE]] | [CLEAR | LIST]
Enables users to run, edit, or delete previously used SQL*Plus, SQL, or PL/SQL commands from the history list in the current session

I’m only going to briefly mention history – it’s not something I see myself using that much, I already have F7 in Windows. The one benefit this new option has is that it can work with ed, albeit with some manual work.
ANDY@pdb1>set history 5
ANDY@pdb1>select * from dual;

D
-
X

ANDY@pdb1>ed
Wrote file afiedt.buf

  1* select dummy from dual
ANDY@pdb1>/

D
-
X

ANDY@pdb1>history
  1  select * from dual;
  2  ed
     select dummy from dual
     /
  3  /
Most statements are rerunnable by calling 
History n run
But see how statement 2 has included the ed command? That prevents you from running it directly, you can call
history 2 edit
And then modify it before running, but you can’t just run the command. Not quite there IMO.

Support for input binding by using the VARIABLE command. The VARIABLE command now supports input binding which can be used in SQL and PL/SQL statements

It has long been a gripe of mine that in order to use bind variables in SQL*Plus, you need to declare them and then run a plsql block in order to set them:
var nNumber number
exec :nNumber := 10
My problem here is that in order to save you parsing one statement, you parse another statement instead!
In 12.2, Oracle has extended the variable command to allow you to set the value of the bind as you declare it. So our simple example now becomes
var nNumber number=10
0 parses.
My main problem with the new syntax is that you can’t use defined variables, so if you have a script that allows for input from the user, you can’t use that for your binds without the plsql call.
ANDY@pdb1>var nNumber number=&1
 SP2-0425: "&1" is not a valid NUMBER
Whilst we’re on the subject of parses, the next new feature is statement caching. Statement caching is well known about in general application programming, it basically makes the difference between a soft parse and no parse (the use of shareable sql saves you the hard parse).
A quick demo of the old behaviour (and default 12.2 behaviour)
ANDY@pdb1>get sql_find
 1  select * from (select sql_id, last_active_time, executions, parse_calls, substr(sql_text,1,100) sql_text
 2  from v$sql where sql_text not like '%v$sql%' and upper(sql_text) like UPPER('%&1%') order by last_active_time desc)
 3* where rownum <=15

alter system flush shared_pool;
 var nNumber number
 exec :nNumber := 5
 select * from dual where :nNumber = 10;
 exec :nNumber := 10
 select * from dual where :nNumber = 10;

@sql_find ":nNumber"

SQL_ID        LAST_ACTIVE_TIME    EXECUTIONS PARSE_CALLS SQL_TEXT
 ------------- ------------------- ---------- ----------- ---------------------------------------------
 2azpdtw4dhbbn 29/03/2017 00:54:11          1           1 BEGIN :nNumber := 10; END;
 fhsvcq3xzfmc1 29/03/2017 00:54:11          1           1 BEGIN :nNumber := 5; END;
 2mty35xpvmgph 29/03/2017 00:54:11          2           2 select * from dual where :nNumber = 10

We can see that our parse calls are much more than you'd expect, and we've also got those pesky ones I was talking about earlier. The separate statements for assigning our bind variables were all hard parsed! This is how 12.2 compares:

alter system flush shared_pool;
 set statementcache 50
 var nNumber number=5
 select * from dual where :nNumber = 10;
 var nNumber number=10
 select * from dual where :nNumber = 10;

@sql_find ":nNumber"

SQL_ID        LAST_ACTIVE_TIME    EXECUTIONS PARSE_CALLS SQL_TEXT
 ------------- ------------------- ---------- ----------- -----------------------------------------------------------
 2mty35xpvmgph 29/03/2017 00:56:08          2           1 select * from dual where :nNumber = 10
All of those parse calls are GONE, fantastic! I’m not going to suggest that your actual production DB has a problem caused by SQL*Plus parsing, but this does give you the ability to demonstrate the effects of some typical application performance enhancements. It certainly won’t harm your shared pool.

SET ROWPREFETCH {1 | n}

There’s one more thing that I want to mention, row prefetching. This is something that I wasn’t clear about to start with, the docs describe it as:
Sets the number of rows that SQL*Plus will prefetch from the database at one time.
If we compare this to the arraysize:
Sets the number of rows, called a batch, that SQL*Plus will fetch from the database at one time.
What’s the difference between fetching and prefetching? A demo may help
alter system flush shared_pool;
 set arraysize 10
 set rowprefetch 50
 set pagesize 10
 set pause on
The pagesize and pause are there so that we display a few rows at a time in sqlplus, our cursor remains open and we can see what has been fetched from the cursor from another session. I’ve added fetches and rows_processed to my @sql_find script so we can monitor that too.
ANDY@pdb1>select * from dual connect by rownum <=110;
 

At this point we haven't displayed any rows. If we check in our other session

ANDY@pdb1>@sql_find dual%connect

SQL_ID        LAST_ACTIVE_TIME    EXECUTIONS PARSE_CALLS    FETCHES ROWS_PROCESSED SQL_TEXT
 ------------- ------------------- ---------- ----------- ---------- -------------- -------------------------------------------------
 40p6csxf1r8p3 29/03/2017 01:07:27          1           0          1             50 select * from dual connect by rownum <=110
Apparently we’ve already processed 50 rows in 1 fetch, that’s our prefetch setting coming into play. What happens when we release our pause and get the next page?
D
 -
 X
 X
 X
 X
 X
 X
 X

 
SQL_ID        LAST_ACTIVE_TIME    EXECUTIONS PARSE_CALLS    FETCHES ROWS_PROCESSED SQL_TEXT
 ------------- ------------------- ---------- ----------- ---------- -------------- ---------------------------------------------
 40p6csxf1r8p3 29/03/2017 01:07:27          1           0          1             50 select * from dual connect by rownum <=110
We’ve not done anything with the SQL but we’ve got 7 rows back! That’s because the rows have already been prefetched by sqlplus and are just sitting on the client side, no need to request them from the DB. Let’s do a few more pages. Once we’ve read all 50 of our fetched rows, the next call does another fetch
SQL_ID        LAST_ACTIVE_TIME    EXECUTIONS PARSE_CALLS    FETCHES ROWS_PROCESSED SQL_TEXT
 ------------- ------------------- ---------- ----------- ---------- -------------- ---------------------------------------------------
 40p6csxf1r8p3 29/03/2017 01:20:58          1           0          2            110 select * from dual connect by rownum <=110
You’ll notice that the second fetch was responsible for 60 rows being processed – honestly this was a surprise, it’s the total of our arraysize and rowprefetch but it’s also the total number of rows in the query.
SQL_ID        LAST_ACTIVE_TIME    EXECUTIONS PARSE_CALLS    FETCHES ROWS_PROCESSED SQL_TEXT
 ------------- ------------------- ---------- ----------- ---------- -------------- ---------------------------------------------------
 40p6csxf1r8p3 29/03/2017 01:21:19          1           0          3            110 select * from dual connect by rownum <=110
The third fetch didn’t process any rows, this is normal as it was our final fetch to mark the cursor as finished.
What happens when we use an arraysize of 5 and a total query size of 200 rows (keeping our rowprefetch 50)?
Here is the @sql_find results at the point of second fetch:
 SQL_ID        LAST_ACTIVE_TIME    EXECUTIONS PARSE_CALLS    FETCHES ROWS_PROCESSED SQL_TEXT
 ------------- ------------------- ---------- ----------- ---------- -------------- ------------------------------------------------
 3rwacgmnkkabu 29/03/2017 01:33:26          1           1          1             50 select * from dual connect by rownum <=200

ANDY@pdb1>/

SQL_ID        LAST_ACTIVE_TIME    EXECUTIONS PARSE_CALLS    FETCHES ROWS_PROCESSED SQL_TEXT
 ------------- ------------------- ---------- ----------- ---------- -------------- ------------------------------------------------
 3rwacgmnkkabu 29/03/2017 01:34:02          1           1          2            105 select * from dual connect by rownum <=200
Yes, we do a prefetch and a fetch worth of rows for the secondary fetch.
Finally, Oracle has baked this new behaviour into a fastmode for SQL*Plus that can be used simply with:
sqlplus -f

ARRAYSIZE = 100

  • LOBPREFETCH = 16384
  • PAGESIZE = 50000
  • ROWPREFETCH = 2
  • STATEMENTCACHE = 20
This simply sets some parameters to what Oracle believes are best for general performance, I have no idea why you wouldn’t use them by default – perhaps you’d rather use your own mix. In my honest opinion it is a no brainer to enable these options, you don’t need to set any of the parameters particularly high to start seeing the benefits.
As an aside, the pagesize was a surprise to me, I wonder how noticeable a change in pagesize is on real performance.
Perhaps the real improvements would be obvious for those huge reports or extracts of tables. The markup command has also been enabled for csv output, this is something that’s been missing for a long time. I’ll plan to benchmark csv extract generation of a big table (e.g for sqlldr input) using old sqlplus, fastmode sqlplus and sqlCL but that’s another day.
Let me know what you think of the new settings and if you manage to leverage them for substantial improvements.

The golden rule of indexing

This is something that I have to remind people of frequently on the OTN so I thought it was worthy of a proper blog post where I can properly demonstrate the point.
So here is this very simple but very important rule:

When you have multiple columns in your index, you can access using the column in the index so long as you use equality predicates on all prior columns in the index

An access predicate is one that can be used to reduce the amount we read from an index, the alternative is a filter predicate which reduces the amount we return from our read. Think how we access a table with an index scan – we only read part of the table, but we filter a table during a full table scan – we read the entire thing (we just return less rows). It’s easier to demonstrate with an example, so let’s make a big table with a few columns.
create table golden_indexes as
with gen as (select /*+materialize*/* from dual connect by rownum <=10000)
select rownum pk_col, sysdate-mod(rownum,365) date_col, trunc(dbms_random.value(0,100)) col_100, dbms_random.string('x',100) padding_col
from   gen
cross join gen 
where rownum <=100000000
/

select count(*) from golden_indexes;

  COUNT(*)
----------
 100000000
 
create index golden_indexes_idx1 on golden_indexes (date_col,col_100);
create index golden_indexes_idx2 on golden_indexes (col_100,date_col);
We have 100,000,000 rows in our table with 100 different values of some number and 365 different values of some date.
We now have a query
select date_col, padding_col 
from   golden_indexes
where  col_100  = 50
and    date_col >= trunc(sysdate)-10 
and    date_col <  trunc(sysdate)-9;
Because we know our data, we know that this query is going to find 100,000,000 / 100 / 365 rows = 2700 rows. This is probably appropriate for an index use rather than a full tablescan – we don’t want to read 100 million rows.
Let’s apply our golden rule with this query to our two indexes, first the one leading date_col:
The first column is date_col, we can use our filter on date_col as an access predicate (there’s no prior columns in the index) so we can use it to provide an index selectivity of 1/365. We can’t apply our filter on col_100 as an access predicate because the filter on date_col is not an equality predicate – it is a range predicate.
Of course, we can still use the col_100 predicate as a filter predicate on the index, it just means we can’t use it to aim our reads against the index. This gives us an index selectivity of 1/365 and table selectivity of 1/365*1/100 = 1/36500
And now for the second index, we can use our col_100 predicate to provide an access predicate giving us an index selectivity of 1/100, we can then apply our date_col predicate as an access predicate as well as the prior filter was equality, this gives us a total index selectivity of 1/100 * 1/365 = 1/36500. No further filters exist so our table selectivity is also 1/36500 (as before)
To demonstrate, we can see the number of leaf blocks in the indexes by querying their stats (they’ll be accurate as we’ve only just built it, real life index stats for you will probably be more inaccurate)
select index_name, blevel, leaf_blocks from user_indexes where table_name = 'GOLDEN_INDEXES' order by 1;


INDEX_NAME                         BLEVEL LEAF_BLOCKS
------------------------------ ---------- -----------
GOLDEN_INDEXES_IDX1                     3      306612
GOLDEN_INDEXES_IDX2                     3      306605
We should expect to read 306612 / 365 = 840 blocks in our index range scan using the first index and read 306605 / 36500 = 9 blocks in our range scan of our second index. All index range scans start with traversing the branch blocks of an index so we should expect to read a block per level of the index, both are 3 in this case.
We can demonstrate this by using row source execution statistics (discussed in https://ctandrewsayer.wordpress.com/2017/03/21/4-easy-lessons-to-enhance-your-performance-diagnostics/)
set serverout off
select /*+index_rs_asc(gi golden_indexes_idx1) NO_INDEX_SS(gi golden_indexes_idx1) gather_plan_statistics*/
       count(*)
from   golden_indexes gi
where  col_100  = 50
and    date_col >= trunc(sysdate)-10 
and    date_col <  trunc(sysdate)-9;
select * from table(dbms_xplan.display_cursor(format=>'allstats typical last'));

Plan hash value: 2147962759

------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name                | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                     |      1 |        |       |  1691 (100)|          |      1 |00:00:00.05 |     844 |
|   1 |  SORT AGGREGATE    |                     |      1 |      1 |    11 |            |          |      1 |00:00:00.05 |     844 |
|*  2 |   FILTER           |                     |      1 |        |       |            |          |   2710 |00:00:00.06 |     844 |
|*  3 |    INDEX RANGE SCAN| GOLDEN_INDEXES_IDX1 |      1 |     55 |   605 |  1691   (1)| 00:00:01 |   2710 |00:00:00.05 |     844 |
------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TRUNC(SYSDATE@!)-9>TRUNC(SYSDATE@!)-10)
   3 - access("DATE_COL">=TRUNC(SYSDATE@!)-10 AND "COL_100"=50 AND "DATE_COL"<TRUNC(SYSDATE@!)-9)
       filter("COL_100"=50)

      
select /*+index(gi golden_indexes_idx2) gather_plan_statistics*/
       count(*)
from   golden_indexes gi
where  col_100  = 50
and    date_col >= trunc(sysdate)-10 
and    date_col <  trunc(sysdate)-9;
select * from table(dbms_xplan.display_cursor(format=>'allstats typical last'));

Plan hash value: 1778390985

------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name                | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                     |      1 |        |       |    20 (100)|          |      1 |00:00:00.01 |      12 |
|   1 |  SORT AGGREGATE    |                     |      1 |      1 |    11 |            |          |      1 |00:00:00.01 |      12 |
|*  2 |   FILTER           |                     |      1 |        |       |            |          |   2710 |00:00:00.01 |      12 |
|*  3 |    INDEX RANGE SCAN| GOLDEN_INDEXES_IDX2 |      1 |     55 |   605 |    20   (0)| 00:00:01 |   2710 |00:00:00.01 |      12 |
------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TRUNC(SYSDATE@!)-9>TRUNC(SYSDATE@!)-10)
   3 - access("COL_100"=50 AND "DATE_COL">=TRUNC(SYSDATE@!)-10 AND "DATE_COL"<TRUNC(SYSDATE@!)-9)

The plan helps us out, it tells us what it is using as an access predicate and what gets used as a filter predicate in the predicates section. Unfortunately for us, it includes filter predicates in the access predicates (IMO this is a mistake).

The row source stats demonstrate that the work done by range scan is much less when the columns are ordered with the columns used in equality predicates first. See how our buffers on the index range scan lines are almost exactly what I mentioned before. This is fairly conclusive that Oracle was able to limit what it read from the index using the predicate only when prior predicates are equalities.

You’ll notice I’ve used a no_index_ss hint in the first query, this was because otherwise the CBO decided it was a good idea to skip scan the index because there are few distinct values in our range it decided that it could just skip around them (which is sensible but probably not realistic where you have dates that have time elements that vary). I’ll make a note of this and hopefully give it the attention it deserves in another post.
Something else that will have a large influence on our cost in using the two indexes here is the clustering factor – we are going to 2710 rows (although the optimizer has only predicted 55) it would certainly be better if these rows were clustered in few blocks rather than spread to 2710 different blocks. Because my data was generated to have the interesting data all over the table, the clustering factors of both indexes were huge. I didn’t share these stats as they would certainly not be representative of your real data, my tables took hours to generate and index on my little Surface Pro and I didn’t have time to make it more realistic – another time (or homework for the reader)?
The problem with real data is that usually the clustering factor of your indexes leading with a date column (which usually relates in some way to insert date) will have a much lower clustering factor than if it began with your equality predicate column. The reality is that using either index you will visit the same table blocks so both clustering factors should be the same. If you expect to return a large number of rows from an index and you find that the CBO tries to avoid the real best index you may need to play with the clustering factor of the index manually – in this case you would copy the clustering factor from the reverse ordered index, or you would increase the clustering factor manually for the “bad” index (or just not make it!). 12c introduced some new functionality for clustering factor statistics gathering that I haven’t yet played with – maybe Oracle has solved this problem, I will update if I find a nice solution.

Summary

Stick the columns you do range scans on last in the index, filters that get equality predicates should come first. 
Of course, there’s always rare cases when this doesn’t apply. One example, if you wish to support order by queries without a sort then the columns should be in the same order as they appear in your order by clause. 
-Aside
You’ll notice I changed my query to do a count(*) rather than selecting actual columns from the table; this was not just to reduce output. When going between the index and table, Oracle releases it’s pins on the index buffers, this means you’ll see more consistent reads in this situation. When Oracle uses the index access by batch rowid (new for 12.1), Oracle will keep reading from the index until it has a sufficient batch of rowids (I’m not completely clear on the full mechanics) before going to the table, so we only have to release and reget our index lead blocks rarely.