DML and SQL Plan Projections

This is just a quick note, I haven’t fully investigated all of the possibilities but this is something that I’ve never seen mentioned before. Just so we’re all on the same page, SQL Plan Projections are the mess you see at the bottom of a plan when you use dbms_xplan with a formatting option of advanced (without “-projection”):

ANDY@pdb1>set feedback only
ANDY@pdb1>select object_id, object_name, created, subobject_name from small_ish_tab;

1000 rows selected.

ANDY@pdb1>set feedback on
ANDY@pdb1>select * from table(dbms_xplan.display_cursor(format=>'advanced'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------
SQL_ID  57wtc8p3v9x1b, child number 0
-------------------------------------
select object_id, object_name, created, subobject_name from
small_ish_tab

Plan hash value: 4205268609

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |       |       |     7 (100)|          |
|   1 |  TABLE ACCESS FULL| SMALL_ISH_TAB |  1000 | 94000 |     7   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1 / SMALL_ISH_TAB@SEL$1

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "SMALL_ISH_TAB"@"SEL$1")
      END_OUTLINE_DATA
  */

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - "OBJECT_NAME"[VARCHAR2,128], "SUBOBJECT_NAME"[VARCHAR2,128],
       "OBJECT_ID"[NUMBER,22], "CREATED"[DATE,7]


39 rows selected.

If you are doing DML then you might think this part of the plan is less than useless but it turns out there’s some key information. Check this out:

ANDY@pdb1>update /*full(t)*/small_ish_tab t set object_name = ' ',created=sysdate where object_id @last

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  1tvtm0tbuygrj, child number 0
-------------------------------------
update /*full(t)*/small_ish_tab t set object_name = ' ',created=sysdate
where object_id < 10

Plan hash value: 3765727017

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name           | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT  |                |      1 |        |       |     3 (100)|          |      0 |00:00:00.01 |      22 |
|   1 |  UPDATE           | SMALL_ISH_TAB  |      1 |        |       |            |          |      0 |00:00:00.01 |      22 |
|*  2 |   INDEX RANGE SCAN| SMALL_ISH_IDX1 |      1 |      7 |   196 |     2   (0)| 00:00:01 |      8 |00:00:00.01 |       2 |
------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - UPD$1
   2 - UPD$1 / T@UPD$1

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"UPD$1")
      INDEX(@"UPD$1" "T"@"UPD$1" ("SMALL_ISH_TAB"."OBJECT_ID"))
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"<10)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   2 - (upd=2,4; cmp=3; cpy=2) "T".ROWID[ROWID,10], "OBJECT_NAME"[VARCHAR2,128], "OBJECT_ID"[NUMBER,22],
       "CREATED"[DATE,7]

Note that little section at the beginning of our projection:

(upd=2,4; cmp=3; cpy=2)

We can take a guess at what this is telling us. We are updating the second and fourth columns listed (we are), we are comparing the third column listed (we are)*. The last one is a bit of a riddle, we’re not copying the column… The special thing about this column is that it’s indexed. What happens if we create an index on created too?

ANDY@pdb1>create index SMALL_ISH_created on SMALL_ISH_tab (created);

Index created.

ANDY@pdb1>update /*full(t)*/small_ish_tab t set object_name = ' ',created=sysdate where object_id @last

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  1tvtm0tbuygrj, child number 0
-------------------------------------
update /*full(t)*/small_ish_tab t set object_name = ' ',created=sysdate
where object_id < 10

Plan hash value: 3765727017

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name           | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT  |                |      1 |        |       |     3 (100)|          |      0 |00:00:00.03 |      26 |      3 |
|   1 |  UPDATE           | SMALL_ISH_TAB  |      1 |        |       |            |          |      0 |00:00:00.03 |      26 |      3 |
|*  2 |   INDEX RANGE SCAN| SMALL_ISH_IDX1 |      1 |      7 |   196 |     2   (0)| 00:00:01 |      8 |00:00:00.01 |       2 |      0 |
---------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - UPD$1
   2 - UPD$1 / T@UPD$1

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"UPD$1")
      INDEX(@"UPD$1" "T"@"UPD$1" ("SMALL_ISH_TAB"."OBJECT_ID"))
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"<10)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   2 - (upd=2,4; cmp=3; cpy=2,4) "T".ROWID[ROWID,10], "OBJECT_NAME"[VARCHAR2,128], "OBJECT_ID"[NUMBER,22], "CREATED"[DATE,7]


45 rows selected.

It now says we cpy=2,4: the created column has been added to this thing (I’m not sure on the real name to use here). This indicates that it is telling us which column updates could be modifying indexes.

Let’s take it a step further, what if we had multiple indexes covering the one of the columns we’re updating:

ANDY@pdb1>create index SMALL_ISH_created2 on SMALL_ISH_tab (created,subobject_name);

Index created.

ANDY@pdb1>update /*full(t)*/small_ish_tab t set object_name = ' ',created=sysdate where object_id @last

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  1tvtm0tbuygrj, child number 0
-------------------------------------
update /*full(t)*/small_ish_tab t set object_name = ' ',created=sysdate
where object_id < 10

Plan hash value: 3765727017

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name           | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT  |                |      1 |        |       |     3 (100)|          |      0 |00:00:00.01 |      23 |      4 |
|   1 |  UPDATE           | SMALL_ISH_TAB  |      1 |        |       |            |          |      0 |00:00:00.01 |      23 |      4 |
|*  2 |   INDEX RANGE SCAN| SMALL_ISH_IDX1 |      1 |      7 |   658 |     2   (0)| 00:00:01 |      8 |00:00:00.01 |       2 |      0 |
---------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - UPD$1
   2 - UPD$1 / T@UPD$1

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"UPD$1")
      INDEX(@"UPD$1" "T"@"UPD$1" ("SMALL_ISH_TAB"."OBJECT_ID"))
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"<10)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   2 - (upd=2,5; cmp=4; cpy=2,3,5) "T".ROWID[ROWID,10], "OBJECT_NAME"[VARCHAR2,128], "T"."SUBOBJECT_NAME"[VARCHAR2,128],
       "OBJECT_ID"[NUMBER,22], "CREATED"[DATE,7]


46 rows selected.

SUBOBJECT_NAME is now part of the projections as we’ll need it to update our new index correctly. It’s also been added to the cpy bit.

So, we can use the cpy to figure out how many indexes we are going to be touching for our update? Not quite, we only have the number of columns that are involved with the indexes that touch the updated columns. If a column is included in multiple indexes and the other columns of that index are already listed, then there’s no way of indicating this. Additionally, if there is an index on 6 columns and one of those is an updated column then all 6 of those columns would appear.

ANDY@pdb1>create index SMALL_ISH_created3 on SMALL_ISH_tab (subobject_name, created);

Index created.

ANDY@pdb1>update /*full(t)*/small_ish_tab t set object_name = ' ',created=sysdate where object_id @last

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  1tvtm0tbuygrj, child number 0
-------------------------------------
update /*full(t)*/small_ish_tab t set object_name = ' ',created=sysdate
where object_id < 10

Plan hash value: 3765727017

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name           | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT  |                |      1 |        |       |     3 (100)|          |      0 |00:00:00.01 |      24 |      1 |
|   1 |  UPDATE           | SMALL_ISH_TAB  |      1 |        |       |            |          |      0 |00:00:00.01 |      24 |      1 |
|*  2 |   INDEX RANGE SCAN| SMALL_ISH_IDX1 |      1 |      7 |   658 |     2   (0)| 00:00:01 |      8 |00:00:00.01 |       2 |      0 |
---------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - UPD$1
   2 - UPD$1 / T@UPD$1

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"UPD$1")
      INDEX(@"UPD$1" "T"@"UPD$1" ("SMALL_ISH_TAB"."OBJECT_ID"))
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"<10)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   2 - (upd=2,5; cmp=4; cpy=2,3,5) "T".ROWID[ROWID,10], "OBJECT_NAME"[VARCHAR2,128], "T"."SUBOBJECT_NAME"[VARCHAR2,128],
       "OBJECT_ID"[NUMBER,22], "CREATED"[DATE,7]


46 rows selected.

We now have two indexes covering created and subobject_name, one index only covering created and one index covering object_name but each column has only been listed once.

Nevertheless, I think this is a nice indicator and I will certainly be using it as a crystal ball when given the Plan projections but not a list of indexes.

The lesson here is that you never know what helpful information you might be getting in the mess that you could be discarding.

You might have noticed the (almost) hint for a full tablescan of the table. This was because I was in the middle of looking into something else when I noticed this behaviour, the method of accessing the table being updated doesn’t seem to make a difference here (unlike in https://jonathanlewis.wordpress.com/2006/11/22/tuning-updates/).

Advertisements

Conditional Outer Joins forcing Nested Loops

This is yet another post inspired by a problem on the ODC https://community.oracle.com/message/14839371#14839371. The OP hasn’t mentioned any performance problem, but this is a classic recipe for issues. We want to left join to another table but only when a condition against our driving table has been met.

Let’s copy the table from the post and demo the conditional outer join scenario (I’ve changed the data types to remove a few other quirks):

drop table oe_order_headers_a;
create table oe_order_headers_a  
(header_id         number   not null
,order_number      number   not null
,order_source_id   number
,attribute20       number
);  
  
insert into oe_order_headers_a(header_id, order_number, order_source_id, attribute20) values (1411757, 10000103, null, null );  
insert into oe_order_headers_a(header_id, order_number, order_source_id, attribute20) values (1411758, 50000408, 1021, 1411757);  
insert into oe_order_headers_a(header_id, order_number, order_source_id, attribute20) values (1411759, 50000409, 2, 1411757);  
insert into oe_order_headers_a(header_id, order_number, order_source_id, attribute20) values (1411760, 50000410, null, null);  
insert into oe_order_headers_a(header_id, order_number, order_source_id, attribute20) values (1411761, 10000105, 2, 1411761);  
insert into oe_order_headers_a(header_id, order_number, order_source_id, attribute20) values (1411762, 50000411, 1021, 1411762);  
insert into oe_order_headers_a(header_id, order_number, order_source_id, attribute20) values (1411763, 50000412, 12, 1411763);  
   
select oh.order_number  
      ,oh.order_source_id  
      ,oh2.order_number parent_order_number  
from   oe_order_headers_a oh  
left join oe_order_headers_a oh2 
  on  (oh.order_source_id = 1021 and oh.attribute20 = oh2.header_id)  
where  oh.order_number between 50000000 and 60000000;


ORDER_NUMBER ORDER_SOURCE_ID PARENT_ORDER_NUMBER
------------ --------------- -------------------
    50000408            1021            10000103
    50000411            1021            50000411
    50000412              12
    50000409               2
    50000410

And if we look at the explain plan and some stats (of a second execution) via SQL*Plus autotrace, we get:

ANDY@pdb1>set autotrace on
ANDY@pdb1>/

Execution Plan
----------------------------------------------------------
Plan hash value: 2432841757

-----------------------------------------------------------------------------------------
| Id  | Operation          | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                    |     5 |   325 |     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN OUTER   |                    |     5 |   325 |     7  (15)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| OE_ORDER_HEADERS_A |     5 |   195 |     3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| OE_ORDER_HEADERS_A |     7 |   182 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OH"."ATTRIBUTE20"="OH2"."HEADER_ID"(+) AND
              "OH"."ORDER_SOURCE_ID"=CASE  WHEN ("OH2"."HEADER_ID"(+) IS NOT NULL) THEN 1021
              ELSE 1021 END )
   2 - filter("OH"."ORDER_NUMBER">=50000000 AND "OH"."ORDER_NUMBER"<=60000000)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         15  consistent gets
          0  physical reads
          0  redo size
        825  bytes sent via SQL*Net to client
        572  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

The plan looks ok, we’re doing a full scan of the same table twice but in reality we would probably have two different tables here. If you take a look at the predicates section, you’ll notice that Uncle Oracle has done something very clever, it’s transformed our conditional outer join condition (the oh.order_source_id = 1021 bit) into another equality join condition:

 "OH"."ORDER_SOURCE_ID"=CASE  WHEN ("OH2"."HEADER_ID"(+) IS NOT NULL) THEN 1021
              ELSE 1021 END )

This is what allows us to do the hash join, you’re not allowed to do a hash join without equality conditions – after all, you have to hash some expression and compare that to a hashed expression from the other side.

Let’s take it a step further, what about if we wanted to also outer join to our other table when our order_source_id is 1022. We would instinctively just change our equality condition into an IN condition, right? Well, here’s what happens:

ANDY@pdb1>select oh.order_number
  2        ,oh.order_source_id
  3        ,oh2.order_number parent_order_number
  4  from   oe_order_headers_a oh
  5  left join oe_order_headers_a oh2
  6    on  (oh.order_source_id in (1021,1022) and oh.attribute20 = oh2.header_id)
  7  where  oh.order_number between 50000000 and 60000000
  8  /

ORDER_NUMBER ORDER_SOURCE_ID PARENT_ORDER_NUMBER
------------ --------------- -------------------
    50000408            1021            10000103
    50000409               2
    50000410
    50000411            1021            50000411
    50000412              12


Execution Plan
----------------------------------------------------------
Plan hash value: 1726314521

-------------------------------------------------------------------------------------------
| Id  | Operation            | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                    |     5 |   260 |    18   (0)| 00:00:01 |
|   1 |  NESTED LOOPS OUTER  |                    |     5 |   260 |    18   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | OE_ORDER_HEADERS_A |     5 |   195 |     3   (0)| 00:00:01 |
|   3 |   VIEW               |                    |     1 |    13 |     3   (0)| 00:00:01 |
|*  4 |    FILTER            |                    |       |       |            |          |
|*  5 |     TABLE ACCESS FULL| OE_ORDER_HEADERS_A |     1 |    26 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OH"."ORDER_NUMBER">=50000000 AND "OH"."ORDER_NUMBER"<=60000000)
   4 - filter("OH"."ORDER_SOURCE_ID"=1021 OR "OH"."ORDER_SOURCE_ID"=1022)
   5 - filter("OH"."ATTRIBUTE20"="OH2"."HEADER_ID")

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         23  consistent gets
          0  physical reads
          0  redo size
        826  bytes sent via SQL*Net to client
        572  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

We are now doing a nested loop to that full tablescan of our outer table! Taking a look at the predicates we can see that Uncle Oracle is no longer working his magic for us: we are doing a filter predicate on line 4 followed by the join predicate on line 5. On the bright side, we are only doing that full table scan once per row in our driving table that matches our conditional join predicate but this is 2 times as many as we really need to.

I’ll note that if our main condition was on something that could use an index and was fine running via nested loop then we wouldn’t have a problem.

How do we address this? We do the work for the optimizer, we transform that conditional join predicate into an equality join predicate.

If we tried to write this how Oracle rewrote the first query internally, it’s easy to see why it wasn’t able this time: it’s difficult! The best I could come up with following the pattern was:

select oh.order_number  
      ,oh.order_source_id  
      ,oh2.order_number parent_order_number  
from   oe_order_headers_a oh  
      ,oe_order_headers_a oh2 
where  oh.attribute20 = oh2.header_id (+)
and    oh.order_source_id in (case  when (oh2.header_id(+) is not null) then 1021 else 1021 end , case  when (oh2.header_id(+) is not null) then 1022 else 1022 end)
and    oh.order_number between 50000000 and 60000000;

ERROR at line 7:
ORA-01719: outer join operator (+) not allowed in operand of OR or IN

This doesn’t even solve our equality requirement, we would have to expand the IN into two union all branches… it’s just not worth it.

Luckily, there is a more obvious transformation which is to do the transformation on the left side of the join. I would naturally write it like this:

select oh.order_number  
      ,oh.order_source_id  
      ,oh2.order_number parent_order_number  
from   oe_order_headers_a oh  
left join oe_order_headers_a oh2 
  on  (case when oh.order_source_id in (1021,1022) then oh.attribute20 else null end = oh2.header_id)  
where  oh.order_number between 50000000 and 60000000;


Execution Plan
----------------------------------------------------------
Plan hash value: 2432841757

-----------------------------------------------------------------------------------------
| Id  | Operation          | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                    |     5 |   325 |     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN OUTER   |                    |     5 |   325 |     7  (15)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| OE_ORDER_HEADERS_A |     5 |   195 |     3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| OE_ORDER_HEADERS_A |     7 |   182 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OH2"."HEADER_ID"(+)=CASE "OH"."ORDER_SOURCE_ID" WHEN 1021 THEN
              "OH"."ATTRIBUTE20" WHEN 1022 THEN "OH"."ATTRIBUTE20" ELSE NULL END )
   2 - filter("OH"."ORDER_NUMBER">=50000000 AND "OH"."ORDER_NUMBER"<=60000000)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         15  consistent gets
          0  physical reads
          0  redo size
        825  bytes sent via SQL*Net to client
        572  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

Back to the standard performance of our hash join from before, bingo!

What jobs will be running over a certain period?

There was a question that recently appeared on the ODC that looked like it could have been about figuring out what jobs will be running during a certain period (next Monday between 8am and 12pm). Unfortunately, my guess at interpreting the question was a little off and they were really asking about how they can check their historic logs…

I did think of (what I think is) a neat way of figuring out when jobs scheduled with dbms_scheduler will run, even when they’re on a complicated schedule, so I’ve decided to share that here. The trick is to get dbms_scheduler to do the hard bit for you!

The basis of the method is to use

dbms_scheduler.evaluate_calendar_string

With this procedure, you pass in the calendar string, the date from which the syntax should be applied on and optionally a date at which the syntax is valid. Let’s quickly demonstrate with a few generated demo jobs:

declare
  sInterval varchar2(4000);
begin 
  for i in 1..10 
  loop
    if i 'demo_job_'||i
                             ,job_type             =>'PLSQL_BLOCK'
                             ,job_action           => 'begin null; end;'
                             ,start_date           => sysdate
                             ,repeat_interval      => sInterval
                             ,enabled              => true
                             ,comments             =>'demo'
                             );
  end loop;
end;
/
select job_name, repeat_interval
from   user_scheduler_jobs
where  comments = 'demo'
order by to_number(substr(job_name,10));

JOB_NAME                       REPEAT_INTERVAL
------------------------------ -----------------------------
DEMO_JOB_1                     freq=daily;interval=1
DEMO_JOB_2                     freq=daily;interval=2
DEMO_JOB_3                     freq=daily;interval=3
DEMO_JOB_4                     freq=daily;interval=4
DEMO_JOB_5                     freq=hourly;interval=5
DEMO_JOB_6                     freq=hourly;interval=6
DEMO_JOB_7                     freq=hourly;interval=7
DEMO_JOB_8                     freq=hourly;interval=8
DEMO_JOB_9                     freq=hourly;interval=9
DEMO_JOB_10                    freq=hourly;interval=10

Now let’s demo using the evaluate_calendar_string to see when these jobs are next going to execute, let’s pretend these jobs have only just executed. This is a procedure; to get the result pre-12c, we would have had to do something like this:

set serverout on
declare
  tNextRun timestamp with time zone;
begin
  dbms_output.put_line('Start date:'||to_char(systimestamp,'dd/mm/yyyy hh24:mi:ss TZR'));
  for rJobs in (select job_name, repeat_interval 
                from   user_scheduler_jobs
                where  comments = 'demo'
                order by to_number(substr(job_name,10))
               )
  loop       
    dbms_scheduler.evaluate_calendar_string (calendar_string   => rJobs.repeat_interval
                                            ,start_date        => null
                                            ,return_date_after => systimestamp
                                            ,next_run_date     => tNextRun
                                            );  
    dbms_output.put_line('Job: '||rJobs.job_name||'; next run:'||to_char(tNextRun,'dd/mm/yyyy hh24:mi:ss TZR'));
  end loop;
end;
/  

Start date:02/06/2018 17:03:43 +01:00
Job: DEMO_JOB_1; next run:03/06/2018 17:03:44 +01:00
Job: DEMO_JOB_2; next run:04/06/2018 17:03:44 +01:00
Job: DEMO_JOB_3; next run:05/06/2018 17:03:44 +01:00
Job: DEMO_JOB_4; next run:06/06/2018 17:03:44 +01:00
Job: DEMO_JOB_5; next run:02/06/2018 22:03:44 +01:00
Job: DEMO_JOB_6; next run:02/06/2018 23:03:44 +01:00
Job: DEMO_JOB_7; next run:03/06/2018 00:03:44 +01:00
Job: DEMO_JOB_8; next run:03/06/2018 01:03:44 +01:00
Job: DEMO_JOB_9; next run:03/06/2018 02:03:44 +01:00
Job: DEMO_JOB_10; next run:03/06/2018 03:03:44 +01:00

PL/SQL procedure successfully completed.

To find the next run after this, we would have to evaluate the calendar string a second time and so on until we reach the end of the period we care about. We could write this within another pl/sql loop, but the format of the results does not make it easy to work with.

Luckily, with 12c we can use PL/SQL within our SQL using a WITH clause, this makes our job super easy: we just need to create a function version of our procedure. Let’s quickly convert what we just did to use this method:

with function evaluateCalendarString(psCalendarString  in varchar2
                                ,ptStartDate       in timestamp with time zone
                                ,ptReturnDateAfter in timestamp with time zone
                                ) return timestamp with time zone
is
  tNextRun timestamp with time zone;
begin
  dbms_scheduler.evaluate_calendar_string (calendar_string   => psCalendarString  
                                          ,start_date        => ptStartDate       
                                          ,return_date_after => ptReturnDateAfter 
                                          ,next_run_date     => tNextRun
                                          );  
  return tNextRun;
end;
select job_name, repeat_interval, evaluateCalendarString(repeat_interval,null,systimestamp) next_run
from   user_scheduler_jobs
where  comments = 'demo'
order by to_number(substr(job_name,10));
/

JOB_NAME                       REPEAT_INTERVAL                NEXT_RUN
------------------------------ ------------------------------ ---------------------------------------------------
DEMO_JOB_1                     freq=daily;interval=1          03-JUN-18 17.04.10.367000000 +01:00
DEMO_JOB_2                     freq=daily;interval=2          04-JUN-18 17.04.10.367000000 +01:00
DEMO_JOB_3                     freq=daily;interval=3          05-JUN-18 17.04.10.367000000 +01:00
DEMO_JOB_4                     freq=daily;interval=4          06-JUN-18 17.04.10.367000000 +01:00
DEMO_JOB_5                     freq=hourly;interval=5         02-JUN-18 22.04.10.367000000 +01:00
DEMO_JOB_6                     freq=hourly;interval=6         02-JUN-18 23.04.10.367000000 +01:00
DEMO_JOB_7                     freq=hourly;interval=7         03-JUN-18 00.04.10.367000000 +01:00
DEMO_JOB_8                     freq=hourly;interval=8         03-JUN-18 01.04.10.367000000 +01:00
DEMO_JOB_9                     freq=hourly;interval=9         03-JUN-18 02.04.10.367000000 +01:00
DEMO_JOB_10                    freq=hourly;interval=10        03-JUN-18 03.04.10.367000000 +01:00

(If you’re not on 12c, then you would have to create that function properly.)

The next step is to use recursive subquery factoring to feed the result of our function. I’m limiting the recursion to 3 levels per job_name just to demonstrate the results. If you’ve never used recursive subquery factoring, there’s no need to be intimidated, all you need to do is have a with clause that references itself in a union all. It’s not my intention to share the fundamentals of this technique, others have done it to death, Google is your friend.

with function evaluateCalendarString(psCalendarString  in varchar2
                                ,ptStartDate       in timestamp with time zone
                                ,ptReturnDateAfter in timestamp with time zone
                                ) return timestamp with time zone
is
  tNextRun timestamp with time zone;
begin
  dbms_scheduler.evaluate_calendar_string (calendar_string   => psCalendarString  
                                          ,start_date        => ptStartDate       
                                          ,return_date_after => ptReturnDateAfter 
                                          ,next_run_date     => tNextRun
                                          );  
  return tNextRun;
end;
  jobSchedule (job_name, repeat_interval, next_run, lvl)
    as (select job_name, repeat_interval, evaluateCalendarString(repeat_interval,null,systimestamp) next_run, 1 lvl
        from   user_scheduler_jobs
        where  comments = 'demo'
      union all 
        select job_name, repeat_interval, evaluateCalendarString(repeat_interval,null,next_run) next_run, lvl+1 lvl
        from   jobSchedule
        where  lvl < 3
       )
select *
from   jobSchedule
order by to_number(substr(job_name,10)), lvl;
/

JOB_NAME      REPEAT_INTERVAL                NEXT_RUN                           LVL
------------- ------------------------------ --------------------------- ----------
DEMO_JOB_1    freq=daily;interval=1          03-JUN-18 17.09.27 +01:00            1
DEMO_JOB_1    freq=daily;interval=1          04-JUN-18 17.09.27 +01:00            2
DEMO_JOB_1    freq=daily;interval=1          05-JUN-18 17.09.27 +01:00            3
DEMO_JOB_2    freq=daily;interval=2          04-JUN-18 17.09.27 +01:00            1
DEMO_JOB_2    freq=daily;interval=2          06-JUN-18 17.09.27 +01:00            2
DEMO_JOB_2    freq=daily;interval=2          08-JUN-18 17.09.27 +01:00            3
DEMO_JOB_3    freq=daily;interval=3          05-JUN-18 17.09.27 +01:00            1
DEMO_JOB_3    freq=daily;interval=3          08-JUN-18 17.09.27 +01:00            2
DEMO_JOB_3    freq=daily;interval=3          11-JUN-18 17.09.27 +01:00            3
DEMO_JOB_4    freq=daily;interval=4          06-JUN-18 17.09.27 +01:00            1
DEMO_JOB_4    freq=daily;interval=4          10-JUN-18 17.09.27 +01:00            2
DEMO_JOB_4    freq=daily;interval=4          14-JUN-18 17.09.27 +01:00            3
DEMO_JOB_5    freq=hourly;interval=5         02-JUN-18 22.09.27 +01:00            1
DEMO_JOB_5    freq=hourly;interval=5         03-JUN-18 03.09.27 +01:00            2
DEMO_JOB_5    freq=hourly;interval=5         03-JUN-18 08.09.27 +01:00            3
DEMO_JOB_6    freq=hourly;interval=6         02-JUN-18 23.09.27 +01:00            1
DEMO_JOB_6    freq=hourly;interval=6         03-JUN-18 05.09.27 +01:00            2
DEMO_JOB_6    freq=hourly;interval=6         03-JUN-18 11.09.27 +01:00            3
DEMO_JOB_7    freq=hourly;interval=7         03-JUN-18 00.09.27 +01:00            1
DEMO_JOB_7    freq=hourly;interval=7         03-JUN-18 07.09.27 +01:00            2
DEMO_JOB_7    freq=hourly;interval=7         03-JUN-18 14.09.27 +01:00            3
DEMO_JOB_8    freq=hourly;interval=8         03-JUN-18 01.09.27 +01:00            1
DEMO_JOB_8    freq=hourly;interval=8         03-JUN-18 09.09.27 +01:00            2
DEMO_JOB_8    freq=hourly;interval=8         03-JUN-18 17.09.27 +01:00            3
DEMO_JOB_9    freq=hourly;interval=9         03-JUN-18 02.09.27 +01:00            1
DEMO_JOB_9    freq=hourly;interval=9         03-JUN-18 11.09.27 +01:00            2
DEMO_JOB_9    freq=hourly;interval=9         03-JUN-18 20.09.27 +01:00            3
DEMO_JOB_10   freq=hourly;interval=10        03-JUN-18 03.09.27 +01:00            1
DEMO_JOB_10   freq=hourly;interval=10        03-JUN-18 13.09.27 +01:00            2
DEMO_JOB_10   freq=hourly;interval=10        03-JUN-18 23.09.27 +01:00            3

Now let’s answer the question, which of these jobs will be running next Monday between 8am and 8pm (I’ve changed the end time to allow the daily jobs to appear)? All we need to do is let our recursion run until next Monday 8pm and then filter it down. We can get the date of next Monday with the next_day function. One point we might need to consider is that scheduler uses timestamp with time zone, luckily we probably only care about our current time zone so we can just use cast to convert our dates.

At this point we should also change our first pass to use the real next_run_date. We should note we are approximating the run duration of each job to be instant, but this won’t mess up our results much.

select cast(trunc(next_day(sysdate,'Monday'))+interval'20'hour as timestamp with time zone) next_monday from dual;

NEXT_MONDAY
---------------------------------------------------------------------------
04-JUN-18 20.00.00 +01:00

with function evaluateCalendarString(psCalendarString  in varchar2
                                ,ptStartDate       in timestamp with time zone
                                ,ptReturnDateAfter in timestamp with time zone
                                ) return timestamp with time zone
is
  tNextRun timestamp with time zone;
begin
  dbms_scheduler.evaluate_calendar_string (calendar_string   => psCalendarString  
                                          ,start_date        => ptStartDate       
                                          ,return_date_after => ptReturnDateAfter 
                                          ,next_run_date     => tNextRun
                                          );  
  return tNextRun;
end;
  jobSchedule (job_name, repeat_interval, next_run, lvl)
    as (select job_name, repeat_interval, nvl(next_run_date, start_date) next_run, 1 lvl
        from   user_scheduler_jobs
        where  comments = 'demo'
      union all 
        select job_name, repeat_interval, evaluateCalendarString(repeat_interval,null,next_run) next_run, lvl+1 lvl
        from   jobSchedule
        where  next_run < cast(trunc(next_day(sysdate,'Monday'))+interval'20'hour as timestamp with time zone)
       )
select *
from   jobSchedule
where  next_run between cast(trunc(next_day(sysdate,'Monday'))+interval'8'hour as timestamp with time zone)
                and     cast(trunc(next_day(sysdate,'Monday'))+interval'20'hour as timestamp with time zone)
order by to_number(substr(job_name,10)), lvl;
/

JOB_NAME      REPEAT_INTERVAL           NEXT_RUN                           LVL
------------- ------------------------- --------------------------- ----------
DEMO_JOB_1    freq=daily;interval=1     04-JUN-18 15.39.16 +01:00            2
DEMO_JOB_2    freq=daily;interval=2     04-JUN-18 15.39.16 +01:00            1
DEMO_JOB_5    freq=hourly;interval=5    04-JUN-18 12.39.16 +01:00            9
DEMO_JOB_5    freq=hourly;interval=5    04-JUN-18 17.39.16 +01:00           10
DEMO_JOB_6    freq=hourly;interval=6    04-JUN-18 09.39.16 +01:00            7
DEMO_JOB_6    freq=hourly;interval=6    04-JUN-18 15.39.16 +01:00            8
DEMO_JOB_7    freq=hourly;interval=7    04-JUN-18 09.39.16 +01:00            6
DEMO_JOB_7    freq=hourly;interval=7    04-JUN-18 16.39.16 +01:00            7
DEMO_JOB_8    freq=hourly;interval=8    04-JUN-18 15.39.17 +01:00            6
DEMO_JOB_9    freq=hourly;interval=9    04-JUN-18 12.39.17 +01:00            5
DEMO_JOB_10   freq=hourly;interval=10   04-JUN-18 17.39.17 +01:00            5

Let’s apply this to all scheduler jobs and see what’s going to run overnight next Saturday (Saturday 10pm to Sunday 2am)

with function evaluateCalendarString(psCalendarString  in varchar2
                                ,ptStartDate       in timestamp with time zone
                                ,ptReturnDateAfter in timestamp with time zone
                                ) return timestamp with time zone
is
  tNextRun timestamp with time zone;
begin
  dbms_scheduler.evaluate_calendar_string (calendar_string   => psCalendarString  
                                          ,start_date        => ptStartDate       
                                          ,return_date_after => ptReturnDateAfter 
                                          ,next_run_date     => tNextRun
                                          );  
  return tNextRun;
end;
  jobSchedule (owner, job_name, job_type, comments, repeat_interval, next_run, lvl)
    as (select owner, job_name, job_type, comments, repeat_interval, nvl(next_run_date, start_date) next_run, 1 lvl
        from   dba_scheduler_jobs
      union all 
        select owner, job_name, job_type, comments, repeat_interval, evaluateCalendarString(repeat_interval,null,next_run) next_run, lvl+1 lvl
        from   jobSchedule
        where  next_run < cast(trunc(next_day(sysdate,'Saturday'))+interval'26'hour as timestamp with time zone)
       )
select *
from   jobSchedule
where  next_run between cast(trunc(next_day(sysdate,'Saturday'))+interval'22'hour as timestamp with time zone)
                and     cast(trunc(next_day(sysdate,'Saturday'))+interval'26'hour as timestamp with time zone)
order by job_name, lvl;
/

OWNER JOB_NAME                       JOB_TYPE         COMMENTS                              REPEAT_INTERVAL                           NEXT_RUN                         LVL
----- ------------------------------ ---------------- ------------------------------------- ----------------------------------------- ------------------------- ----------
SYS   CLEANUP_ONLINE_IND_BUILD       PLSQL_BLOCK      Cleanup Online Index Build            FREQ = HOURLY; INTERVAL = 1               09-JUN-18 22.39.46 +01:00        173
SYS   CLEANUP_ONLINE_IND_BUILD       PLSQL_BLOCK      Cleanup Online Index Build            FREQ = HOURLY; INTERVAL = 1               09-JUN-18 23.39.46 +01:00        174
SYS   CLEANUP_ONLINE_IND_BUILD       PLSQL_BLOCK      Cleanup Online Index Build            FREQ = HOURLY; INTERVAL = 1               10-JUN-18 00.39.46 +01:00        175
SYS   CLEANUP_ONLINE_IND_BUILD       PLSQL_BLOCK      Cleanup Online Index Build            FREQ = HOURLY; INTERVAL = 1               10-JUN-18 01.39.46 +01:00        176
SYS   CLEANUP_ONLINE_PMO             PLSQL_BLOCK      Cleanup after Failed PMO              FREQ = HOURLY; INTERVAL = 1               09-JUN-18 22.40.26 +01:00        173
SYS   CLEANUP_ONLINE_PMO             PLSQL_BLOCK      Cleanup after Failed PMO              FREQ = HOURLY; INTERVAL = 1               09-JUN-18 23.40.26 +01:00        174
SYS   CLEANUP_ONLINE_PMO             PLSQL_BLOCK      Cleanup after Failed PMO              FREQ = HOURLY; INTERVAL = 1               10-JUN-18 00.40.26 +01:00        175
SYS   CLEANUP_ONLINE_PMO             PLSQL_BLOCK      Cleanup after Failed PMO              FREQ = HOURLY; INTERVAL = 1               10-JUN-18 01.40.26 +01:00        176
SYS   CLEANUP_TAB_IOT_PMO            PLSQL_BLOCK      Cleanup Tables after IOT PMO          FREQ = HOURLY; INTERVAL = 1               09-JUN-18 22.39.56 +01:00        173
SYS   CLEANUP_TAB_IOT_PMO            PLSQL_BLOCK      Cleanup Tables after IOT PMO          FREQ = HOURLY; INTERVAL = 1               09-JUN-18 23.39.56 +01:00        174
SYS   CLEANUP_TAB_IOT_PMO            PLSQL_BLOCK      Cleanup Tables after IOT PMO          FREQ = HOURLY; INTERVAL = 1               10-JUN-18 00.39.56 +01:00        175
SYS   CLEANUP_TAB_IOT_PMO            PLSQL_BLOCK      Cleanup Tables after IOT PMO          FREQ = HOURLY; INTERVAL = 1               10-JUN-18 01.39.56 +01:00        176
SYS   CLEANUP_TRANSIENT_PKG          PLSQL_BLOCK      Cleanup Transient Packages            FREQ = HOURLY; INTERVAL = 2               09-JUN-18 23.40.16 +01:00         87
SYS   CLEANUP_TRANSIENT_PKG          PLSQL_BLOCK      Cleanup Transient Packages            FREQ = HOURLY; INTERVAL = 2               10-JUN-18 01.40.16 +01:00         88
SYS   CLEANUP_TRANSIENT_TYPE         PLSQL_BLOCK      Cleanup Transient Types               FREQ = HOURLY; INTERVAL = 2               09-JUN-18 23.27.38 +01:00         83
SYS   CLEANUP_TRANSIENT_TYPE         PLSQL_BLOCK      Cleanup Transient Types               FREQ = HOURLY; INTERVAL = 2               10-JUN-18 01.27.38 +01:00         84
ANDY  DEMO_JOB_5                     PLSQL_BLOCK      demo                                  freq=hourly;interval=5                    09-JUN-18 22.39.16 +01:00         35
ANDY  DEMO_JOB_7                     PLSQL_BLOCK      demo                                  freq=hourly;interval=7                    09-JUN-18 22.39.16 +01:00         25
ANDY  DEMO_JOB_8                     PLSQL_BLOCK      demo                                  freq=hourly;interval=8                    09-JUN-18 23.39.17 +01:00         22
SYS   RSE$CLEAN_RECOVERABLE_SCRIPT   PLSQL_BLOCK      auto clean job for recoverable script freq=daily;byhour=0;byminute=0;bysecond=0 10-JUN-18 00.00.00 +01:00          8
SYS   SM$CLEAN_AUTO_SPLIT_MERGE      PLSQL_BLOCK      auto clean job for auto split merge   freq=daily;byhour=0;byminute=0;bysecond=0 10-JUN-18 00.00.00 +01:00          8

Hopefully you’ve found this an interesting use of recursive subquery factoring and with pl/sql to get useful details about your scheduler jobs.

SQL*Plus for CSV Reports (and batching with Powershell)

I’ve seen a lot of recent solutions to producing CSV reports using a command line that are just so over the top. I’ve seen scripts that select the column names from dual and then union all that to the results of a query. I can’t fathom the headaches that people must have from supporting these abominations. Luckily, you don’t need to live like that. All you need is a recent Oracle Client (12.2 onwards), remember you don’t need to be using a 12.2 Database, it’s just the SQL*Plus client we’re after.

You might as well download the latest version Oracle Client you can get, as of the time of writing, that is 18.1.0.0.0 which includes the bonus features of rowlimit, feedback sql_id and linesize window.

To enable CSV formatting in SQL*Plus, all you need to do is

set markup csv on

Then run your query, it will take care of the rest for you!

SQL> set markup csv on
SQL> select dummy, 'a' a from dual;

"DUMMY","A"
"X","a"

How easy was that?

You can even set the markup directly from your command line when you call SQL*Plus with the -M argument.

C:\Users\andys>sqlplus -M "csv on" andy/andy@pdb1

SQL*Plus: Release 18.0.0.0.0 Production on Sat May 26 18:38:36 2018
Version 18.2.0.0.0

Copyright (c) 1982, 2018, Oracle.  All rights reserved.

Last Successful login time: Sat May 26 2018 18:35:43 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> select 'world' hello from dual;

"HELLO"
"world"

SQL>

If we want to turn this into a more batch friendly solution, there’s a few things we would change. We wouldn’t want all the SQLPlus spam (for lack of a better word), we would want to save the output to a file, we would want SQLPlus to exit. One other important thing, that is only obvious when it goes wrong, we don’t want SQL*Plus to request user input if something goes wrong with the login.

How do we do all of that? Easy, we save our report sql into a script like:

my_report.sql:
set feedback off termout off
spool my_report.log
select object_type, count(*) cnt from all_objects group by object_type;
exit

And we call the script like

sqlplus -M "csv on" -L -S andy/andy@pdb1 @my_report.sql

The -L switch is to prevent the user input being required if something is wrong with our connecting (see https://connor-mcdonald.com/2016/03/26/are-your-sql-plus-scripts-going-to-ell/)
The -S switch is to prevent a lot of the spammy output

The result:

C:\Users\andys>sqlplus -M "csv on" -L -S andy/andy@pdb1 @my_report.sql

C:\Users\andys>strings my_report.log
"OBJECT_TYPE","CNT"
"EDITION",1
...
"CLUSTER",10
"TYPE",2125
"EVALUATION CONTEXT",3

C:\Users\andys>

Lovely!

Powershell

Since I’m on Windows, I end up using Powershell lots for my batch processes; can we use this to run our reports? Absolutely! We can even rely on Powershell to do the spooling for us, as well as making sure the important bits like feedback and exiting are done.

Here’s my new cut down report script, you’ll notice it’s just the one SQL statement:

C:\Users\andys>strings my_report_v2.sql
select object_type, count(*) cnt from all_objects group by object_type;

C:\Users\andys>

We can import this into a Powershell string array using the get-content cmdlet. We can combine it with our own custom headers and footers using standard array addition.

PS C:\Users\andys> [string[]] $headerCommands = "set feedback off","set termout on"
PS C:\Users\andys> [string[]] $footerCommands = "exit"
PS C:\Users\andys> [string[]] $scriptCommands = get-content my_report_v2.sql
PS C:\Users\andys> $headerCommands + $scriptCommands + $footerCommands
set feedback off
set termout on
select object_type, count(*) cnt from all_objects group by object_type;
exit
PS C:\Users\andys>

We can then pipe these commands into a SQL*Plus session and pipe the results out to a file with the following

PS C:\Users\andys> $headerCommands + $script + $footerCommands | sqlplus --% -M "csv on" -L -S andy/andy@pdb1 | set-content spool_file.log

Note the –% just after the sqlplus, this is so that we can specify our arguments without it trying to parse everything. You could also make the target filename dynamic, perhaps it should include the date, perhaps it should default to something based on the script file path.

It might be useful to save this as a Powershell script that you can just call when you need to. As we are saving this, we will remove our connection string and authentication and opt for ones you pass to the script (you can go down the Oracle Wallet road but that’s for another day). I’ve included the ability to pass in standalone statements to my own:

oracleCSV.ps1
Param(
  [string]$connection,
  [string]$queryPath,
  [string]$query,
  [string]$outputPath,
  [switch]$show
)

if (!$queryPath -and !$query) {
  throw "No query to run, use -query or -queryPath"
  }

if (!$outputPath) {
  if (!$queryPath) {
  $outputPath = "report.csv"
    } else {
  $outputPath = [io.path]::GetFileNameWithoutExtension($queryPath) + ".csv"
    }
 }
write-host "saving to $outputPath"
 
[string[]] $headerCommands = "set feedback off","set termout on"
[string[]] $footerCommands = "exit"

if (!$query) {
  [string[]] $scriptCommands = get-content $queryPath
 } else {
  [string[]] $scriptCommands = $query
 }
 
$headerCommands + $scriptCommands + $footerCommands | sqlplus -M "csv on" -L -S $connection | set-content $outputPath
if ($show) {
  get-content $outputPath
}

Demo:

PS C:\Users\andys> .\oracleCSV.ps1 "andy/andy@pdb1" -query "select * from dual;" -show
saving to report.csv

"DUMMY"
"X"

PS C:\Users\andys> .\oracleCSV.ps1 "andy/andy@pdb1" -query "select object_type, object_name from user_objects where rownum <=10;" -outputPath "myobjects.csv" -show saving to myobjects.csv "OBJECT_TYPE","OBJECT_NAME" "TABLE","AS_BIG_TABLE" "TABLE","AS_BULK_INSERT_ME" "TRIGGER","AS_BULK_INSERT_ME_TRG" "TABLE","AS_PK" "TABLE","AS_TAB" "TABLE","AS_TAB2" "INDEX","AS_TAB_UIX" "TABLE","AS_TINY_TABLE" "INDEX","AS_TINY_TABLE_TINY_INDEX" "TABLE","BASELINE_BIG_TABLE" PS C:\Users\andys> .\oracleCSV.ps1 "andy/andy@pdb1" -queryPath "my_report_v2.sql"  -show
saving to my_report_v2.csv

"OBJECT_TYPE","CNT"
"EDITION",1
"INDEX PARTITION",189
"TABLE SUBPARTITION",32
"CONSUMER GROUP",18
"SEQUENCE",103
"TABLE PARTITION",308
"SCHEDULE",4
"JAVA DATA",7
"PROCEDURE",72
"OPERATOR",62
"DESTINATION",2
"WINDOW",9
"SCHEDULER GROUP",4
"PACKAGE",770
"PACKAGE BODY",343
"PROGRAM",1
"RULE SET",4
"LIBRARY",67
"TYPE BODY",111
"XML SCHEMA",43
"JAVA RESOURCE",1728
"DIRECTORY",11
"JOB CLASS",2
"TRIGGER",137
"MATERIALIZED VIEW",2
"INDEX",2180
"TABLE",2183
"SYNONYM",12178
"VIEW",7000
"FUNCTION",350
"INDEXTYPE",11
"JAVA CLASS",38125
"CLUSTER",10
"TYPE",2125
"EVALUATION CONTEXT",3
PS C:\Users\andys>

Hopefully, it won’t take much work effort for you to benefit from these in your own batch scripts.

Not just another ORA-12154: TNS:could not resolve the connect identifier specified (12.1)

There are many posts out there describing why you would normally get this error, but here’s one reason that I’ve not seen before. It looks like a bug specific to the 12.1 client (possibly Windows only).

The setup for this issue requires that you (for some reason) have multiple aliases for the same entry in your tnsnames.ora, one with a domain. I really don’t like the use of domains in tnsnames alias’s, and I really don’t like the use of names.default_domain in the sqlnet.ora file. This was just part of the standard setup used in my shop so I have to deal with it.

Anyway, here’s my tnsnames entry:

C:\Users\andys>set tns_admin=C:\app\andys\virtual\product\12.2.0\dbhome_1\network\admin

C:\Users\andys>strings C:\app\andys\virtual\product\12.2.0\dbhome_1\network\admin\tnsnames.ora
ONE.DOMAIN, TWO.DOMAIN, ONE =
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS= (PROTOCOL=TCP)(Host=localhost)(Port = 1521))
)
(CONNECT_DATA=
(SID=NOTREAL)
(SERVER=DEDICATED)
)

As you can see, I should be able to use three different names for this entry: one.domain, two.domain or one. This works fine in my 12.2 Oracle client:

C:\Users\andys>C:\app\andys\virtual\product\12.2.0\dbhome_1\bin\tnsping.exe one

TNS Ping Utility for 64-bit Windows: Version 12.2.0.1.0 - Production on 28-APR-2018 16:42:13

Copyright (c) 1997, 2016, Oracle. All rights reserved.

Used parameter files:
C:\app\andys\virtual\product\12.2.0\dbhome_1\network\admin\sqlnet.ora

 

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION= (ADDRESS_LIST= (ADDRESS= (PROTOCOL=TCP)(Host=localhost)(Port = 1521))) (CONNECT_DATA= (SID=NOTREAL) (SERVER=DEDICATED)))
OK (20 msec)

C:\Users\andys>C:\app\andys\virtual\product\12.2.0\dbhome_1\bin\tnsping.exe two.domain

TNS Ping Utility for 64-bit Windows: Version 12.2.0.1.0 - Production on 28-APR-2018 16:42:16

Copyright (c) 1997, 2016, Oracle. All rights reserved.

Used parameter files:
C:\app\andys\virtual\product\12.2.0\dbhome_1\network\admin\sqlnet.ora

 

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION= (ADDRESS_LIST= (ADDRESS= (PROTOCOL=TCP)(Host=localhost)(Port = 1521))) (CONNECT_DATA= (SID=NOTREAL) (SERVER=DEDICATED)))
OK (0 msec)

C:\Users\andys>C:\app\andys\virtual\product\12.2.0\dbhome_1\bin\tnsping.exe one.domain

TNS Ping Utility for 64-bit Windows: Version 12.2.0.1.0 - Production on 28-APR-2018 16:42:19

Copyright (c) 1997, 2016, Oracle. All rights reserved.

Used parameter files:
C:\app\andys\virtual\product\12.2.0\dbhome_1\network\admin\sqlnet.ora

 

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION= (ADDRESS_LIST= (ADDRESS= (PROTOCOL=TCP)(Host=localhost)(Port = 1521))) (CONNECT_DATA= (SID=NOTREAL) (SERVER=DEDICATED)))
OK (30 msec)

(All successfully identified the tnsnames entry).

But with my 12.1 Client:

C:\Users\andys>tnsping one

TNS Ping Utility for 64-bit Windows: Version 12.1.0.2.0 - Production on 28-APR-2018 16:40:40

Copyright (c) 1997, 2014, Oracle. All rights reserved.

Used parameter files:
C:\app\andys\virtual\product\12.2.0\dbhome_1\network\admin\sqlnet.ora

 

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION= (ADDRESS_LIST= (ADDRESS= (PROTOCOL=TCP)(Host=localhost)(Port = 1521))) (CONNECT_DATA= (SID=NOTREAL) (SERVER=DEDICATED)))
OK (0 msec)

C:\Users\andys>tnsping two.domain

TNS Ping Utility for 64-bit Windows: Version 12.1.0.2.0 - Production on 28-APR-2018 16:40:45

Copyright (c) 1997, 2014, Oracle. All rights reserved.

Used parameter files:
C:\app\andys\virtual\product\12.2.0\dbhome_1\network\admin\sqlnet.ora

 

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION= (ADDRESS_LIST= (ADDRESS= (PROTOCOL=TCP)(Host=localhost)(Port = 1521))) (CONNECT_DATA= (SID=NOTREAL) (SERVER=DEDICATED)))
OK (0 msec)

C:\Users\andys>tnsping one.domain

TNS Ping Utility for 64-bit Windows: Version 12.1.0.2.0 - Production on 28-APR-2018 16:40:49

Copyright (c) 1997, 2014, Oracle. All rights reserved.

Used parameter files:
C:\app\andys\virtual\product\12.2.0\dbhome_1\network\admin\sqlnet.ora

TNS-03505: Failed to resolve name

Somehow, we are able to use one and two.domain but we’re not able to use one.domain. Perhaps, the existence of one (without the domain) is overriding the one.domain name. What happens if we reorder the names in our tnsnames.ora entry:

C:\Users\andys>strings C:\app\andys\virtual\product\12.2.0\dbhome_1\network\admin\tnsnames.ora
TWO.DOMAIN, ONE,ONE.DOMAIN =
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS= (PROTOCOL=TCP)(Host=localhost)(Port = 1521))
)
(CONNECT_DATA=
(SID=NOTREAL)
(SERVER=DEDICATED)
)

C:\Users\andys>tnsping two.domain

TNS Ping Utility for 64-bit Windows: Version 12.1.0.2.0 - Production on 28-APR-2018 16:45:05

Copyright (c) 1997, 2014, Oracle. All rights reserved.

Used parameter files:
C:\app\andys\virtual\product\12.2.0\dbhome_1\network\admin\sqlnet.ora

 

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION= (ADDRESS_LIST= (ADDRESS= (PROTOCOL=TCP)(Host=localhost)(Port = 1521))) (CONNECT_DATA= (SID=NOTREAL) (SERVER=DEDICATED)))
OK (0 msec)

C:\Users\andys>tnsping one

TNS Ping Utility for 64-bit Windows: Version 12.1.0.2.0 - Production on 28-APR-2018 16:45:09

Copyright (c) 1997, 2014, Oracle. All rights reserved.

Used parameter files:
C:\app\andys\virtual\product\12.2.0\dbhome_1\network\admin\sqlnet.ora

 

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION= (ADDRESS_LIST= (ADDRESS= (PROTOCOL=TCP)(Host=localhost)(Port = 1521))) (CONNECT_DATA= (SID=NOTREAL) (SERVER=DEDICATED)))
OK (10 msec)

C:\Users\andys>tnsping one.domain

TNS Ping Utility for 64-bit Windows: Version 12.1.0.2.0 - Production on 28-APR-2018 16:45:17

Copyright (c) 1997, 2014, Oracle. All rights reserved.

Used parameter files:
C:\app\andys\virtual\product\12.2.0\dbhome_1\network\admin\sqlnet.ora

 

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION= (ADDRESS_LIST= (ADDRESS= (PROTOCOL=TCP)(Host=localhost)(Port = 1521))) (CONNECT_DATA= (SID=NOTREAL) (SERVER=DEDICATED)))
OK (10 msec)

Hurray!

This buggy behaviour only seems to be in place for 12.1, so upgrade your client to 12.2 today (and get a ton of cool new SQL*Plus things)!

 

Really Bulk DML

This post has been sat in my drafts for a long time, I haven’t yet dug into everything as much as I’d like to but I have some hunches and I know where I want to look. The problem is that, since I did these tests, I have upgraded to a new system, it’s still home quality but it will probably have a different performance profile. Hopefully posting this now will give me enough of a kick to get the rest finished.

The problem: You want to update all the rows in a big table using data from another table, how do you do it fast?

You may have your own reasons for doing this but for me, this is quite a common scenario:
We want to add further data to a large fact table in the datawarehouse. The OLTP database has been storing this data for years but there has been no point in sending it to the datawarehouse, until now.

There’s a few difficulties that you might run into, I don’t intend to cover them all; I only intend to focus on one problem. As always, the real best solution, depends on requirements and problems. That said, there is usually a solution which solves the problem completely.

To start the demo, I’ll build a base table, this will be the table I use when creating tables for the actual demo. Between any test, I will drop my demo tables and rebuild from the base table.

ANDY@pdb1>set echo on
ANDY@pdb1>col segment_name for a30
ANDY@pdb1>col segment_type for a30
ANDY@pdb1>
ANDY@pdb1>drop table base_table;

Table dropped.

ANDY@pdb1>select max(object_id) from dba_objects;

MAX(OBJECT_ID)                                                                                                                                                                                          
--------------                                                                                                                                                                                          
         80871                                                                                                                                                                                          

ANDY@pdb1>create table base_table as
  2  select owner,object_name,subobject_name,r.rn+o.object_id object_id,data_object_id,object_type,created
  3        ,last_ddl_time,timestamp,status,temporary,generated,secondary,namespace
  4        ,edition_name,sharing,editionable,oracle_maintained,application,default_collation
  5        ,duplicated,sharded,created_appid,created_vsnid,modified_appid,modified_vsnid
  6  from   (select * from dba_objects where object_id is not null) o
  7  cross join (select rownum*100000 rn from dual connect by rownum select segment_name, segment_type, blocks,  round(bytes/1024/1024,2) size_mb
  2  from   dba_segments
  3  where  owner = user
  4  and    segment_name = 'BASE_TABLE'
ANDY@pdb1>/

SEGMENT_NAME                   SEGMENT_TYPE                       BLOCKS    SIZE_MB
------------------------------ ------------------------------ ---------- ----------
BASE_TABLE                     TABLE                               16000        125 

That might not seem big to you, but this demo is running from my Surface tablet, my buffer cache can be measured in mb (I want to be able to use the tablet for non-DB related things too)

ANDY@pdb1>select name, round(bytes/1024/1024.2) size_mb from v$sgastat where name = 'buffer_cache';

NAME                              SIZE_MB
------------------------------ ----------
buffer_cache                           44

This is our setup script for the fact table and the table we want to take values from

ANDY@pdb1>get bulk_dml_setup.sql
  1  set timing off
  2  set autotrace off
  3  set echo off
  4  drop table big_table;
  5  drop table tmp_big_table_object_name;
  6  create table big_table as 
  7  select * from base_table;
  8  create unique index big_table_pk on big_table (object_id);
  9  alter table big_table add constraint big_table_pk primary key (object_id) using index;
 10  create table tmp_big_table_object_name
 11  as
 12  select object_id, reverse(object_name) object_name
 13  from   big_table;
 14  select segment_name, segment_type, blocks,  round(bytes/1024/1024,2) size_mb
 15  from   dba_segments 
 16  where  owner = user
 17  and    segment_name like '%BIG_TABLE%';
 18  select name, round(bytes/1024/1024.2) size_mb from v$sgastat where name = 'buffer_cache';
 19  set echo on
 20  set timing on
 21* set autotrace on
 22  

We will be updating big_table with values from tmp_big_table_object_name. Both tables have the same number of rows, the size of the new value is identical to the size of the old value. This is a simplification to how it would be in real life, taking out some of the variables is always helpful for a demo though. The segments and cache query are there to make sure nothing has happened to the table sizes and the cache hasn’t automatically grown, the numbers will be good to have close by when we want to make comments on things we see in the autotrace output.

I’m using SQL*Plus autotrace to show the statistics delta from executing the SQL, I’m also using it to show the explain plan, I have no reason to believe the actual execution plan will be any different, but we can always double check if we get suspicious.

ANDY@pdb1>@@bulk_dml_setup.sql
ANDY@pdb1>set timing off
ANDY@pdb1>set autotrace off
ANDY@pdb1>set echo off

Table dropped.


Table dropped.


Table created.


Index created.


Table altered.


Table created.


SEGMENT_NAME                   SEGMENT_TYPE                       BLOCKS    SIZE_MB                                                                                                                     
------------------------------ ------------------------------ ---------- ----------                                                                                                                     
BIG_TABLE                      TABLE                               15360        120                                                                                                                     
TMP_BIG_TABLE_OBJECT_NAME      TABLE                                4864         38                                                                                                                     
BIG_TABLE_PK                   INDEX                                1664         13                                                                                                                     


NAME                              SIZE_MB                                                                                                                                                               
------------------------------ ----------                                                                                                                                                               
buffer_cache                           44                                                                                                                                                               

ANDY@pdb1>set timing on
ANDY@pdb1>set autotrace on
ANDY@pdb1>

(I won’t repeat this output each time, you can trust that this information does not change between tests)

Now to the good part, or rather the first attempt at an update. This is possibly the most common starting point (from what I’ve seen), a scalar subquery in an update statement. I’ve not indexed the new value table so the execution plans available are limited. I know this is going to be very slow, so I’ve snuck in an additional rownum filter so we are not updating every row (I did kick off the same update without the rownum filter but it ran for days before I killed it)

ANDY@pdb1>update big_table t
  2  set    t.object_name = (select u.object_name
  3                          from   tmp_big_table_object_name u
  4                          where  u.object_id = t.object_id
  5                         )
  6  where  rownum <= 1000;

1000 rows updated.

Elapsed: 00:10:02.21

Execution Plan
----------------------------------------------------------                                                                                                                                              
Plan hash value: 2714481146                                                                                                                                                                             
                                                                                                                                                                                                        
-------------------------------------------------------------------------------------------------                                                                                                       
| Id  | Operation           | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |                                                                                                       
-------------------------------------------------------------------------------------------------                                                                                                       
|   0 | UPDATE STATEMENT    |                           |  1000 | 41000 |  1013M  (1)| 10:59:39 |                                                                                                       
|   1 |  UPDATE             | BIG_TABLE                 |       |       |            |          |                                                                                                       
|*  2 |   COUNT STOPKEY     |                           |       |       |            |          |                                                                                                       
|   3 |    TABLE ACCESS FULL| BIG_TABLE                 |   771K|    30M|  4116   (1)| 00:00:01 |                                                                                                       
|*  4 |   TABLE ACCESS FULL | TMP_BIG_TABLE_OBJECT_NAME |     1 |    39 |  1314   (1)| 00:00:01 |                                                                                                       
-------------------------------------------------------------------------------------------------                                                                                                       
                                                                                                                                                                                                        
Predicate Information (identified by operation id):                                                                                                                                                     
---------------------------------------------------                                                                                                                                                     
                                                                                                                                                                                                        
   2 - filter(ROWNUM<=1000)                                                                                                                                                                             
   4 - filter("U"."OBJECT_ID"=:B1)                                                                                                                                                                      


Statistics
----------------------------------------------------------                                                                                                                                              
         14  recursive calls                                                                                                                                                                            
       6069  db block gets                                                                                                                                                                              
    4743036  consistent gets                                                                                                                                                                            
    4731026  physical reads                                                                                                                                                                             
     574080  redo size                                                                                                                                                                                  
        873  bytes sent via SQL*Net to client                                                                                                                                                           
       1147  bytes received via SQL*Net from client                                                                                                                                                     
          3  SQL*Net roundtrips to/from client                                                                                                                                                          
          2  sorts (memory)                                                                                                                                                                             
          0  sorts (disk)                                                                                                                                                                               
       1000  rows processed     

10 minutes to update 1000 rows, no wonder it would take days to do the entire result set. I can immediately see, from the explain plan, that for every row we read from big_table, we executed a scalar subquery to do a full tablescan of tmp_big_table_object_name. Those 1000 full tablescans read 4864 blocks each, multiply those two numbers together and we get close to the 4.7 million consistent gets stat and the almost identical physical reads stat. Misleadingly, due to the nature of the statistics shown, it does look like redo size was about 1/10 as important as physical reads, but we’ve got to remember that physical reads is in units of DB blocks (8kb blocks in my case) and redo size is in bytes. We’re comparing reading 36 Gb to writing 560 kb, big difference!

For the sake of completeness, here’s a SQL Live Monitor report from the execution just before I killed it. I don’t think the execution started time is accurate.

MONITOR_REPORT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL Monitoring Report

SQL Text
------------------------------
update big_table t set t.object_name = (select u.object_name from tmp_big_table_object_name u where u.object_id = t.object_id )

Global Information
------------------------------
 Status              :  EXECUTING
 Instance ID         :  1
 Session             :  ANDY (272:25747)
 SQL ID              :  aad53w2d7c20h
 SQL Execution ID    :  16777216
 Execution Started   :  11/20/2017 07:51:53
 First Refresh Time  :  11/20/2017 07:51:57
 Last Refresh Time   :  11/22/2017 00:32:57
 Duration            :  146458s
 Module/Action       :  SQL*Plus/-
 Service             :  pdb1.andrew
 Program             :  sqlplus.exe

Global Stats
====================================================================================
| Elapsed |   Cpu   |    IO    | Application | Concurrency | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   |  Waits(s)   |  Gets  | Reqs | Bytes |
====================================================================================
|  148329 |   22323 |   126004 |        0.08 |        1.71 |     1G | 143M |   8TB |
====================================================================================

SQL Plan Monitoring Details (Plan Hash Value=1080485927)
===========================================================================================================================================================================
| Id   |      Operation       |           Name            |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  | Activity | Activity Detail | Progress |
|      |                      |                           | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes |   (%)    |   (# samples)   |          |
===========================================================================================================================================================================
| -> 0 | UPDATE STATEMENT     |                           |         |      |    146495 |     +4 |     1 |        0 |      |       |          |                 |          |
| -> 1 |   UPDATE             | BIG_TABLE                 |         |      |    146495 |     +4 |     1 |        0 |   44 | 352KB |          |                 |          |
| -> 2 |    TABLE ACCESS FULL | BIG_TABLE                 |    771K | 4116 |    146495 |     +4 |     1 |     238K |  819 |  65MB |          |                 |      30% |
| -> 3 |    TABLE ACCESS FULL | TMP_BIG_TABLE_OBJECT_NAME |       1 | 1314 |    146495 |     +4 |  238K |     238K | 143M |   8TB |          |                 |          |
===========================================================================================================================================================================

As we can see that we are spending all the time reading the tmp table, we might decide there are two typical approaches to making it faster
A) rewrite the statement so you only do the full scan once
B) create an index so you don’t have to look at all of the table in each iteration

As A could mean more effort, let’s start with B. Let’s clean up the table

ANDY@pdb1>@@bulk_dml_setup.sql

We add a standard BTree index on the joining column. We’ll keep the rownum predicate, just in case. I’m including the time taken to create the index here as I see it as part of the overall process – it might be separately required though, in which case creation time is less of an importance.

ANDY@pdb1>create index tmp_idx_01 on tmp_big_table_object_name (object_id);

Index created.

Elapsed: 00:00:08.98
ANDY@pdb1>
ANDY@pdb1>update big_table t
  2  set    t.object_name = (select u.object_name
  3                          from   tmp_big_table_object_name u
  4                          where  u.object_id = t.object_id
  5                         )
  6  where  rownum <= 1000;

1000 rows updated.

Elapsed: 00:00:00.06

Execution Plan
----------------------------------------------------------                                                                                                                                              
Plan hash value: 2582342017                                                                                                                                                                             
                                                                                                                                                                                                        
------------------------------------------------------------------------------------------------------------------                                                                                      
| Id  | Operation                            | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |                                                                                      
------------------------------------------------------------------------------------------------------------------                                                                                      
|   0 | UPDATE STATEMENT                     |                           |  1000 | 41000 |  3859K (20)| 00:02:31 |                                                                                      
|   1 |  UPDATE                              | BIG_TABLE                 |       |       |            |          |                                                                                      
|*  2 |   COUNT STOPKEY                      |                           |       |       |            |          |                                                                                      
|   3 |    TABLE ACCESS FULL                 | BIG_TABLE                 |   771K|    30M|  4117   (1)| 00:00:01 |                                                                                      
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| TMP_BIG_TABLE_OBJECT_NAME |     1 |    39 |     4   (0)| 00:00:01 |                                                                                      
|*  5 |    INDEX RANGE SCAN                  | TMP_IDX_01                |     1 |       |     3   (0)| 00:00:01 |                                                                                      
------------------------------------------------------------------------------------------------------------------                                                                                      
                                                                                                                                                                                                        
Predicate Information (identified by operation id):                                                                                                                                                     
---------------------------------------------------                                                                                                                                                     
                                                                                                                                                                                                        
   2 - filter(ROWNUM<=1000)                                                                                                                                                                             
   5 - access("U"."OBJECT_ID"=:B1)                                                                                                                                                                      


Statistics
----------------------------------------------------------                                                                                                                                              
          5  recursive calls                                                                                                                                                                            
       2062  db block gets                                                                                                                                                                              
       2067  consistent gets                                                                                                                                                                            
         28  physical reads                                                                                                                                                                             
     547624  redo size                                                                                                                                                                                  
        873  bytes sent via SQL*Net to client                                                                                                                                                           
       1147  bytes received via SQL*Net from client                                                                                                                                                     
          3  SQL*Net roundtrips to/from client                                                                                                                                                          
          2  sorts (memory)                                                                                                                                                                             
          0  sorts (disk)                                                                                                                                                                               
       1000  rows processed   

It took less than a tenth of a second to update 1000 rows, we can see the plan was what we expected: index access to tmp to read one row for each row in big_table. We could note that the redo size was ever-so-slightly more than previously, I’m going to shrug it off though. We only did 28 physical reads but 2067 consistent gets and 2062 db block gets.

Now that we’re confident that this seems to not be a disaster, let’s do our update like this for the entire table.

ANDY@pdb1>@@bulk_dml_setup.sql

ANDY@pdb1>create index tmp_idx_01 on tmp_big_table_object_name (object_id);

Index created.

Elapsed: 00:00:05.89
ANDY@pdb1>
ANDY@pdb1>update big_table t
  2  set    t.object_name = (select u.object_name
  3                          from   tmp_big_table_object_name u
  4                          where  u.object_id = t.object_id
  5                         );

771090 rows updated.

Elapsed: 00:02:50.62

Execution Plan
----------------------------------------------------------                                                                                                                                              
Plan hash value: 3303311140                                                                                                                                                                             
                                                                                                                                                                                                        
------------------------------------------------------------------------------------------------------------------                                                                                      
| Id  | Operation                            | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |                                                                                      
------------------------------------------------------------------------------------------------------------------                                                                                      
|   0 | UPDATE STATEMENT                     |                           |   771K|    30M|  3859K (20)| 00:02:31 |                                                                                      
|   1 |  UPDATE                              | BIG_TABLE                 |       |       |            |          |                                                                                      
|   2 |   TABLE ACCESS FULL                  | BIG_TABLE                 |   771K|    30M|  4116   (1)| 00:00:01 |                                                                                      
|   3 |   TABLE ACCESS BY INDEX ROWID BATCHED| TMP_BIG_TABLE_OBJECT_NAME |     1 |    39 |     4   (0)| 00:00:01 |                                                                                      
|*  4 |    INDEX RANGE SCAN                  | TMP_IDX_01                |     1 |       |     3   (0)| 00:00:01 |                                                                                      
------------------------------------------------------------------------------------------------------------------                                                                                      
                                                                                                                                                                                                        
Predicate Information (identified by operation id):                                                                                                                                                     
---------------------------------------------------                                                                                                                                                     
                                                                                                                                                                                                        
   4 - access("U"."OBJECT_ID"=:B1)                                                                                                                                                                      


Statistics
----------------------------------------------------------                                                                                                                                              
        113  recursive calls                                                                                                                                                                            
    1587711  db block gets                                                                                                                                                                              
    1581779  consistent gets                                                                                                                                                                            
      21453  physical reads                                                                                                                                                                             
  449794452  redo size                                                                                                                                                                                  
        873  bytes sent via SQL*Net to client                                                                                                                                                           
       1125  bytes received via SQL*Net from client                                                                                                                                                     
          3  SQL*Net roundtrips to/from client                                                                                                                                                          
          2  sorts (memory)                                                                                                                                                                             
          0  sorts (disk)                                                                                                                                                                               
     771090  rows processed                                                                                                                                                                             

It completed, excellent. But look at those numbers: we did 1587711 db block gets to update 771090 rows, that’s about 2:1, but that’s the sort of gets you’d expect to do when you do an index read then a table read for every row. From a timing point of view, it wasn’t bad at all, if we pretend that all our time was due to physical reads (there’ll be CPU overhead etc that we’re ignoring) then a read took about 8 ms, which sounds correct enough on this non-server system.

If we got down to below 3 minutes with an index, and we really want to read all of our tmp table, imagine what we can do if we used a full tablescan! One way of writing this method is with a merge statement.

ANDY@pdb1>@@bulk_dml_setup.sql

ANDY@pdb1>merge into big_table t
  2  using tmp_big_table_object_name u
  3  on (t.object_id = u.object_id)
  4  when matched then update
  5  set t.object_name = u.object_name;

771090 rows merged.

Elapsed: 00:23:27.10

Execution Plan
----------------------------------------------------------                                                                                                                                              
Plan hash value: 1637517488                                                                                                                                                                             
                                                                                                                                                                                                        
----------------------------------------------------------------------------------------------------------                                                                                              
| Id  | Operation            | Name                      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |                                                                                              
----------------------------------------------------------------------------------------------------------                                                                                              
|   0 | MERGE STATEMENT      |                           |   771K|    74M|       | 15947   (1)| 00:00:01 |                                                                                              
|   1 |  MERGE               | BIG_TABLE                 |       |       |       |            |          |                                                                                              
|   2 |   VIEW               |                           |       |       |       |            |          |                                                                                              
|*  3 |    HASH JOIN         |                           |   771K|   125M|    37M| 15947   (1)| 00:00:01 |                                                                                              
|   4 |     TABLE ACCESS FULL| TMP_BIG_TABLE_OBJECT_NAME |   771K|    28M|       |  1313   (1)| 00:00:01 |                                                                                              
|   5 |     TABLE ACCESS FULL| BIG_TABLE                 |   771K|    97M|       |  4130   (1)| 00:00:01 |                                                                                              
----------------------------------------------------------------------------------------------------------                                                                                              
                                                                                                                                                                                                        
Predicate Information (identified by operation id):                                                                                                                                                     
---------------------------------------------------                                                                                                                                                     
                                                                                                                                                                                                        
   3 - access("T"."OBJECT_ID"="U"."OBJECT_ID")                                                                                                                                                          


Statistics
----------------------------------------------------------                                                                                                                                              
        516  recursive calls                                                                                                                                                                            
     798988  db block gets                                                                                                                                                                              
      20273  consistent gets                                                                                                                                                                            
     427021  physical reads                                                                                                                                                                             
  267251496  redo size                                                                                                                                                                                  
        873  bytes sent via SQL*Net to client                                                                                                                                                           
       1067  bytes received via SQL*Net from client                                                                                                                                                     
          3  SQL*Net roundtrips to/from client                                                                                                                                                          
         34  sorts (memory)                                                                                                                                                                             
          0  sorts (disk)                                                                                                                                                                               
     771090  rows processed     

This was supposed to be much faster, but it really wasn’t, it was 8.3 times as slow. What happened? Let’s compare our stats side by side:

Nested Loop                                                 Hash Join
    
Statistics                                                  Statistics
----------------------------------------------------------  ----------------------------------------------------------  
        113  recursive calls                                        516  recursive calls                                
    1587711  db block gets                                       798988  db block gets                                  
    1581779  consistent gets                                      20273  consistent gets                                
      21453  physical reads                                      427021  physical reads                                 
  449794452  redo size                                        267251496  redo size                                      
        873  bytes sent via SQL*Net to client                       873  bytes sent via SQL*Net to client               
       1125  bytes received via SQL*Net from client                1067  bytes received via SQL*Net from client         
          3  SQL*Net roundtrips to/from client                        3  SQL*Net roundtrips to/from client              
          2  sorts (memory)                                          34  sorts (memory)                                 
          0  sorts (disk)                                             0  sorts (disk)                                   
     771090  rows processed                                      771090  rows processed    

Our physical reads have gone from 21,453 to 427,021, that’s 20 times larger. That’s enough for me to point a finger. Why did we do more physical reads when this was supposed to do less – full tablescans are able to do multiblock reads and read each block only once. Look at those consistent gets, we’ve gone from 1,581,779 to 20,273… that’s a massive reduction, so why did our physical reads go up?

Feel free to share your own thoughts, I will share an explanation in a follow up post (soon, promise!) with supporting traces.

Mostly NULL column in a composite index

This is just a fun one to show something special that can happen when you allow NULL values in your indexes. Let’s get some basics down first:

  • NULL is special, it is the absense of a value. Oracle knows that it is special and even gives you the greatest present it could ever give you – a free frequency histogram bucket just for NULLs.
  • If all the columns in an index are valued NULL, there will be no index key for this row.
  • Indexes have their own statistics, but no concept of number of nulls.

So, here’s my set up:

drop table as_magic_indexes;
create table as_magic_indexes 
as 
with ctrl as (select 1 from dual connect by rownum  -- >comment for wordpress formatting
select rownum pk_col, mod(rownum,10) filter_10, rpad('x',400,'x') padding_col
      ,cast(null as number) important_col 
      ,mod(rownum,100) magic_col 
from ctrl o 
cross join ctrl a;
update as_magic_indexes
set    important_col = pk_col 
where  pk_col >= (select max(pk_col) from as_magic_indexes)-20;
commit;

We have a 1 million row table (even if WordPress has cut off some of the script: ctrl has 1000 rows). 21 rows have a non-null value for the important_col column. If we gathered statistics and got an explain plan for a select from the table querying a column and this important_col, we should see that Oracle expects a low cardinality.

exec dbms_stats.gather_table_stats(null,'as_magic_indexes',method_opt=>'for all columns size 1')
select num_rows from user_tables where table_name = 'AS_MAGIC_INDEXES';
  NUM_ROWS
----------
   1000000
select column_name, num_nulls, num_distinct from user_tab_columns where table_name = 'AS_MAGIC_INDEXES';
COLUMN_NAME      NUM_NULLS NUM_DISTINCT
--------------- ---------- ------------
PK_COL                   0      1000000
FILTER_10                0           10
PADDING_COL              0            1
IMPORTANT_COL       999979           21
MAGIC_COL                0          100
explain plan for
select * from as_magic_indexes
where  filter_10 = :x and important_col = :y;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------
Plan hash value: 1274804564
--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |     1 |   413 | 16057   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| AS_MAGIC_INDEXES |     1 |   413 | 16057   (1)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("IMPORTANT_COL"=TO_NUMBER(:Y) AND "FILTER_10"=TO_NUMBER(:X))

1 row in fact, a perfect recipe for an index?

create index as_magic_indexes_idx1 on as_magic_indexes (filter_10,important_col);
select index_name, blevel, leaf_blocks, distinct_keys, avg_leaf_blocks_per_key from user_indexes where index_name = 'AS_MAGIC_INDEXES_IDX1';
INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- -----------------------
AS_MAGIC_INDEXES_IDX1                   2        2078            31                      67
explain plan for
select * from as_magic_indexes
where  filter_10 = :x and important_col = :y;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------
Plan hash value: 1274804564
--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  | 32258 |    12M| 16057   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| AS_MAGIC_INDEXES | 32258 |    12M| 16057   (1)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("IMPORTANT_COL"=TO_NUMBER(:Y) AND "FILTER_10"=TO_NUMBER(:X))

Oh, what happened there? The index statistics cover both our filter columns in one go, this is useful if there is a relationship between the columns (the same can be achieved with a column group statistic). The CBO will trust this grouped statistic more than it trusts combining two column level statistics, but in doing so we’ve completely lost our num NULLs stat. Our new cardinality is given by our num_rows from user_tables divided by distinct_keys from that index we’ve added (1,000,000/31=32258).

What if we could have an index that covered our two filters and keep our free histogram?

drop index as_magic_indexes_idx1;
create index as_magic_indexes_idx2 on as_magic_indexes (filter_10,important_col,magic_col);
select index_name, blevel, leaf_blocks, distinct_keys, avg_leaf_blocks_per_key from user_indexes where index_name = 'AS_MAGIC_INDEXES_IDX2';
INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- -----------------------
AS_MAGIC_INDEXES_IDX2                   2        2492           121                      20
explain plan for
select * from as_magic_indexes
where  filter_10 = :x and important_col = :y;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------
Plan hash value: 3008093446
-------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                       |  8264 |  3333K|  8291   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| AS_MAGIC_INDEXES      |  8264 |  3333K|  8291   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | AS_MAGIC_INDEXES_IDX2 |  8264 |       |    23   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("FILTER_10"=TO_NUMBER(:X) AND "IMPORTANT_COL"=TO_NUMBER(:Y))

All we had to do is add another column to the index, giving it a larger number of distinct keys. I’m adding magic_col because I was planning on turning this into a quiz, but decided a blog post would be better. This gave us an explain plan that uses the index, we still have a large cardinality (it’s still num_rows/distinct_keys), this could trip us up later. Note that if the old index still existed, then it’s statistics will still be in play (and you’d get the full tablescan again).

What if we added a column that was unique instead? This could be a sensible idea because we know that our important_col is unique for it’s not null values.

drop index as_magic_indexes_idx2;
create index as_magic_indexes_idx3 on as_magic_indexes (filter_10,important_col,pk_col);
select index_name, blevel, leaf_blocks, distinct_keys, avg_leaf_blocks_per_key from user_indexes where index_name = 'AS_MAGIC_INDEXES_IDX3';
INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- -----------------------
AS_MAGIC_INDEXES_IDX3                   2        2769       1000000                       1
explain plan for
select * from as_magic_indexes
where  filter_10 = :x and important_col = :y;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------
Plan hash value: 4058028991
-------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                       |     1 |   413 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| AS_MAGIC_INDEXES      |     1 |   413 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | AS_MAGIC_INDEXES_IDX3 |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("FILTER_10"=TO_NUMBER(:X) AND "IMPORTANT_COL"=TO_NUMBER(:Y))

Problem solved? Maybe. With all that manipulation, we can be sure we’ve messed up somehow…
Do we still have accurate cardinality if we filter on important_col is null (which should be a lot of the table)?

explain plan for
select * from as_magic_indexes
where  filter_10 = :x and important_col is null;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 1274804564
--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  | 99998 |    39M| 16056   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| AS_MAGIC_INDEXES | 99998 |    39M| 16056   (1)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FILTER_10"=TO_NUMBER(:X) AND "IMPORTANT_COL" IS NULL)

That seems right to me: 99998 = num_nulls of important col / num_distinct of filter_10. However, let’s not get too confident, just because I can’t think of an immediate problem this has caused, doesn’t mean it’s not possible. Every time I think I’ve done something clever, I probably have messed something else up in a stupid way…

I hope you found this useful, I can imagine this sort of indexing might be helpful in other scenarios too – or at least will give you some imaginative ideas when trying to solve interesting performance problems.