Real Time Materialized Views (Part 2)

In part one I played around with some different materialized view logs on a standard table to get a feel for them. Now I’ll try and demonstrate what you’re really here for

REAL TIME MATERIALIZED VIEWS!

Baby steps though, let’s start with a materialized view which is just a replica of a base table. We’ll use the same base table from part one and we’ll leave the final materialized view log in place:

create materialized view mv_t_1
as
select pk_col, number_col, string_col
from   base_t_1;
Materialized view created.

You might wonder why such a simple materialized view could be of any use. A couple of things that come to mind:

  • You want to (heavily) index the table but don’t want your DML to hit those indexes (bitmaps would be a killer to OLTP performance in a multiuser environment)
  • The base table is on a remote table (because you don’t want your batch/reporting queries to interfere with your OLTP

We have our fresh MV and we have an empty materialized view log, so let’s get started trying to keep a fresh view. We’ll take this one DML operation at a time, gradually building up our fresh view.

Inserting a new row

ANDY@orcl>insert into base_t_1 (pk_col, number_col, string_col)
  2  select seq_1.nextVal, trunc(dbms_random.value(0,100)), dbms_random.string('x',10)
  3  from   dual;

1 row created.

Elapsed: 00:00:00.14
ANDY@orcl>@print "select * from MLOG$_BASE_T_1"

PK_COL                        : 1021
NUMBER_COL                    : 73
STRING_COL                    : 1KHJWYSQBE
SEQUENCE$$                    : 310008
SNAPTIME$$                    : 01/01/4000 00:00:00
DMLTYPE$$                     : I
OLD_NEW$$                     : N
CHANGE_VECTOR$$               : FE
XID$$                         : 1970337721882360
-----------------

We’ve seen this so many times now. How do we replicate this in the MV? Well easy: it’s an insert (dmltype$$ = ‘I’ and we know that this is the new version of the row (old_new$$=’N’), so we can just union all the row in:

select pk_col, number_col, string_col from mv_t_1
union all
select pk_col, number_col, string_col from mlog$_base_t_1;

Updates

Of course, we know that this is going to cause problems when the base table row is updated – we’ll get a row per version of the row in the base table and therefore many rows in this query. We want to project only the latest version of the row, luckily we can do this with analytics as we have our primary key and we know that the sequence$$ value gives us an order to the DML.

select pk_col, number_col, string_col from mv_t_1
union all
select pk_col, number_col, string_col
from  (
select pk_col, number_col, string_col
     ,row_number() over (partition by pk_col order by sequence$$ desc) rown 
from mlog$_base_t_1
      )
where  rown = 1;

That should do the trick, right? Well how do we know? I don’t fancy querying a 1000+ row base table and comparing it to a 1000+ row query. Maybe we should create a reconciliation routine, my favourite method for doing this would be the group by method demonstrated by Stew Ashton here: https://stewashton.wordpress.com/2014/02/04/compare-and-sync-tables-tom-kyte-and-group-by/ . If we let our fresh mv be defined as a view, it’ll be easier to use, I’ve just used created a view called fresh_t_1 and used the query from earlier.

Here’s our reconciliation query, it should return zero rows if fresh_t_1 is exactly the same as base_t_1 (what we want). We’ll save this as a script so we can easily rerun it.

select pk_col, number_col, string_col
,sum(OLD_CNT) OLD_CNT, sum(NEW_CNT) NEW_CNT
FROM (
  select pk_col, number_col, string_col 
   , 1 OLD_CNT, 0 NEW_CNT
  from base_t_1 o
  union all
  select pk_col, number_col, string_col 
    , 0 OLD_CNT, 1 NEW_CNT
  from fresh_t_1 n
)
group by pk_col, number_col, string_col 
having sum(OLD_CNT) != sum(NEW_CNT);
save recon.sql

No rows, fantastic! So what happens when we do our update:

ANDY@orcl>update base_t_1 set number_col = 100 where pk_col = 1021;
1 row updated.
@recon
no rows selected

So far, so good. What about if we update one of the rows that already existed?

ANDY@orcl>update base_t_1 set number_col = 100 where pk_col = 1;

1 row updated.
ANDY@orcl>@recon
    PK_COL NUMBER_COL STRING_COL                                  OLD_CNT    NEW_CNT
---------- ---------- ---------------------------------------- ---------- ----------
         1         71 H2QDH5Z0SO                                        0          1

1 row selected.
What’s gone wrong? Hmm, let’s look for clues:
select * from fresh_t_1 where pk_col = 1;    PK_COL NUMBER_COL STRING_COL
---------- ---------- ---------------------
         1         71 H2QDH5Z0SO
         1        100 H2QDH5Z0SO

2 rows selected.
It looks like we’ve still got the old version of our row here, this will be because it exists in the MView. What can we do to fix that? One option would be to not project rows in the MView that have a row in the MView log for that PK:
create or replace view fresh_t_1 as
select pk_col, number_col, string_col 
from   mv_t_1
where  pk_col not in (select pk_col from mlog$_base_t_1)
union all
select pk_col, number_col, string_col
from  (
select pk_col, number_col, string_col
     ,row_number() over (partition by pk_col order by sequence$$ desc) rown 
from mlog$_base_t_1
      )
where  rown = 1;
Another option could be to left join from the MView to the analytic query and use NVL2 on the MView log pk column to decide which row source to take a column from. We would then not report the row in the second part of the union all- checking whether the earliest version of the row logged was an Old row could do this. Because we’ll be doing that analytic in both parts, we could use a WITH clause to make this easier to write (and maybe more performant). We can get check the earliest row with another analytic, this is the full thing:
create or replace view fresh_t_1_alt as
with mloglytic as (select pk_col, number_col, string_col, first_oldnew
                   from  (
                   select pk_col, number_col, string_col
                        ,row_number() over (partition by pk_col order by sequence$$ desc) rown 
                        ,max(old_new$$) keep (dense_rank first order by sequence$$) over (partition by pk_col) first_oldnew
                   from mlog$_base_t_1
                         )
                   where  rown = 1
                  )
select mv.pk_col, nvl2(mv_log.pk_col,mv_log.number_col,mv.number_col) number_col, nvl2(mv_log.pk_col,mv_log.string_col,mv.string_col) string_col
from   mv_t_1 mv
left join mloglytic mv_log
  on   mv.pk_col = mv_log.pk_col
union all
select pk_col, number_col, string_col
from   mloglytic
where  first_oldnew = 'N';
This looks slightly more complex than the prior solution, but that doesn’t make it bad. Both versions reconcile with the current data set, maybe one will perform better than the other? We can investigate later, there’s one more obvious DML operation we need to think about first.

Deletes

Let’s just see what happens with our two solutions when we do a delete on an existing row:

ANDY@orcl>delete base_t_1 where pk_col = 5;

1 row deleted.

Elapsed: 00:00:00.01
ANDY@orcl>@recon

    PK_COL NUMBER_COL STRING_COL                                  OLD_CNT    NEW_CNT
---------- ---------- ---------------------------------------- ---------- ----------
         5         22 8HGNZQAX2R                                        0          1
And the second view:
    PK_COL NUMBER_COL STRING_COL                                  OLD_CNT    NEW_CNT
---------- ---------- ---------------------------------------- ---------- ----------
         5         22 8HGNZQAX2R                                        0          1
They both suffer, the deleted row hasn’t been removed from our fresh views.
The first view will suffer because we are projecting all of the latest versions of each row in our second part of the union all. We can address this by checking that the latest version of the row is a New version rather than the old (we’ve already had to do a similar thing in the previous run using the first version of row).
create or replace view fresh_t_1 as
select pk_col, number_col, string_col 
from   mv_t_1
where  pk_col not in (select pk_col from mlog$_base_t_1)
union all
select pk_col, number_col, string_col
from  (
select pk_col, number_col, string_col
      ,row_number() over (partition by pk_col order by sequence$$ desc) rown 
      ,old_new$$
from mlog$_base_t_1
      )
where  rown = 1
and    old_new$$ = 'N';
ANDY@orcl>@recon

no rows selected
Can we fix the alternate view? Sure, we just need to make sure when we make the left join to our analytic function, we filter out all rows that match up to a deleted row. For this, we pull through the latest old_new$$ value.
create or replace view fresh_t_1_alt as
with mloglytic as (select pk_col, number_col, string_col, first_oldnew, old_new$$
                   from  (
                   select pk_col, number_col, string_col
                        ,row_number() over (partition by pk_col order by sequence$$ desc) rown 
                        ,max(old_new$$) keep (dense_rank first order by sequence$$) over (partition by pk_col) first_oldnew
                         ,old_new$$ old_new$$
                   from mlog$_base_t_1
                         )
                   where  rown = 1
                  )
select mv.pk_col, nvl2(mv_log.pk_col,mv_log.number_col,mv.number_col) number_col, nvl2(mv_log.pk_col,mv_log.string_col,mv.string_col) string_col
from   mv_t_1 mv
left join mloglytic mv_log
  on   mv.pk_col = mv_log.pk_col
where  mv_log.pk_col is null or mv_log.old_new$$ = 'N'
union all
select pk_col, number_col, string_col
from   mloglytic
where  first_oldnew = 'N';

Does it reconcile?

no rows selected
Yes!
What about deletes on a row we’ve inserted ourselves (not contained in the MV):
ANDY@orcl>delete base_T_1 where pk_col = 1021;

1 row deleted.

Elapsed: 00:00:00.01
ANDY@orcl>@recon

no rows selected

And the second?

    PK_COL NUMBER_COL STRING_COL                                  OLD_CNT    NEW_CNT
---------- ---------- ---------------------------------------- ---------- ----------
      1021         73 1KHJWYSQBE                                       0          1
Not so good, so where’s our latest problem? We have forgotten to make sure the latest version of the row is not a delete in our second part. Let’s fix it:
create or replace view fresh_t_1_alt as
with mloglytic as (select pk_col, number_col, string_col, first_oldnew, old_new$$
                   from  (
                   select pk_col, number_col, string_col
                        ,row_number() over (partition by pk_col order by sequence$$ desc) rown 
                        ,max(old_new$$) keep (dense_rank first order by sequence$$) over (partition by pk_col) first_oldnew
                         ,old_new$$ old_new$$
                   from mlog$_base_t_1
                         )
                   where  rown = 1
                  )
select mv.pk_col, nvl2(mv_log.pk_col,mv_log.number_col,mv.number_col) number_col, nvl2(mv_log.pk_col,mv_log.string_col,mv.string_col) string_col
from   mv_t_1 mv
left join mloglytic mv_log
  on   mv.pk_col = mv_log.pk_col
where  mv_log.pk_col is null or mv_log.old_new$$ = 'N'
union all
select pk_col, number_col, string_col
from   mloglytic
where  first_oldnew = 'N'
and    old_new$$ = 'N';
Does it recon? Yes!

Performance

We have two views now, lets see what performance we can expect. I’m going to do explain plan just to see what Oracle is capable of doing for us, all I’m looking at is where filters are being applied, I don’t care that full table scans happen I care that they happen and pull through all of the rows (we can index for the opposite scenario). First a primary key lookup of the fresh view:

ANDY@orcl>explain plan for select * from fresh_t_1 where pk_col = :x;

Explained.

Elapsed: 00:00:00.01
ANDY@orcl>@x

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------
Plan hash value: 454948185

----------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                |     2 |    96 |     9  (12)| 00:00:01 |
|   1 |  VIEW                             | FRESH_T_1      |     2 |    96 |     9  (12)| 00:00:01 |
|   2 |   UNION-ALL                       |                |       |       |            |          |
|*  3 |    HASH JOIN ANTI NA              |                |     1 |    31 |     5   (0)| 00:00:01 |
|   4 |     MAT_VIEW ACCESS BY INDEX ROWID| MV_T_1         |     1 |    18 |     2   (0)| 00:00:01 |
|*  5 |      INDEX UNIQUE SCAN            | SYS_C0014867   |     1 |       |     1   (0)| 00:00:01 |
|   6 |     TABLE ACCESS FULL             | MLOG$_BASE_T_1 |     7 |    91 |     3   (0)| 00:00:01 |
|*  7 |    VIEW                           |                |     1 |    63 |     4  (25)| 00:00:01 |
|*  8 |     WINDOW SORT PUSHED RANK       |                |     1 |    63 |     4  (25)| 00:00:01 |
|*  9 |      TABLE ACCESS FULL            | MLOG$_BASE_T_1 |     1 |    63 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("PK_COL"="PK_COL")
   5 - access("PK_COL"=TO_NUMBER(:X))
   7 - filter("ROWN"=1 AND "OLD_NEW$$"='N')
   8 - filter(ROW_NUMBER() OVER ( PARTITION BY "PK_COL" ORDER BY
              INTERNAL_FUNCTION("SEQUENCE$$") DESC )<=1)
   9 - filter("PK_COL"=TO_NUMBER(:X))
Note
—–
– dynamic statistics used: dynamic sampling (level=2)
Here we access the MView log twice, we apply filters on both of them so potentially indexing on pk_col could help. If we were worried about going to the table twice we could WITH it but that might effect our ability to push that predicate.
Here’s the plan with the alt view
Plan hash value: 2918522968

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                           |    14 |   672 |     6   (0)| 00:00:01 |
|   1 |  VIEW                               | FRESH_T_1_ALT             |    14 |   672 |     6   (0)| 00:00:01 |
|   2 |   TEMP TABLE TRANSFORMATION         |                           |       |       |            |          |
|   3 |    LOAD AS SELECT                   | SYS_TEMP_0FD9D6686_6B752F |       |       |            |          |
|*  4 |     VIEW                            |                           |     7 |   455 |     4  (25)| 00:00:01 |
|   5 |      WINDOW SORT                    |                           |     7 |   441 |     4  (25)| 00:00:01 |
|   6 |       TABLE ACCESS FULL             | MLOG$_BASE_T_1            |     7 |   441 |     3   (0)| 00:00:01 |
|   7 |    UNION-ALL                        |                           |       |       |            |          |
|*  8 |     FILTER                          |                           |       |       |            |          |
|   9 |      NESTED LOOPS OUTER             |                           |     7 |   476 |     4   (0)| 00:00:01 |
|  10 |       MAT_VIEW ACCESS BY INDEX ROWID| MV_T_1                    |     1 |    18 |     2   (0)| 00:00:01 |
|* 11 |        INDEX UNIQUE SCAN            | SYS_C0014867              |     1 |       |     1   (0)| 00:00:01 |
|* 12 |       VIEW                          |                           |     7 |   350 |     2   (0)| 00:00:01 |
|  13 |        TABLE ACCESS FULL            | SYS_TEMP_0FD9D6686_6B752F |     7 |   364 |     2   (0)| 00:00:01 |
|* 14 |     VIEW                            |                           |     7 |   364 |     2   (0)| 00:00:01 |
|  15 |      TABLE ACCESS FULL              | SYS_TEMP_0FD9D6686_6B752F |     7 |   364 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("ROWN"=1)
   8 - filter("MV_LOG"."PK_COL" IS NULL OR "MV_LOG"."OLD_NEW$$"='N')
  11 - access("MV"."PK_COL"=TO_NUMBER(:X))
  12 - filter("MV_LOG"."PK_COL"(+)=TO_NUMBER(:X))
  14 - filter("FIRST_OLDNEW"='N' AND "OLD_NEW$$"='N' AND "PK_COL"=TO_NUMBER(:X))
Note
—–
– dynamic statistics used: dynamic sampling (level=2)
Now, we only go to the mview log table once, but it hasn’t pushed our primary key filter into it, that could effect performance when there is large amounts of data in the mview log and we only care about few rows. Is that going to be a problem? Are you likely to use filters on the primary key (this would be aimed for the cases where this is a replication environment)?
What about if we filter on one of the other columns:
ANDY@orcl>explain plan for select * from fresh_t_1 where number_col =4;

Explained.

Elapsed: 00:00:00.02
ANDY@orcl>@x

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------
Plan hash value: 2930466682

---------------------------------------------------------------------------------------------
| Id  | Operation                  | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                |     8 |   384 |    11  (10)| 00:00:01 |
|   1 |  VIEW                      | FRESH_T_1      |     8 |   384 |    11  (10)| 00:00:01 |
|   2 |   UNION-ALL                |                |       |       |            |          |
|*  3 |    HASH JOIN ANTI NA       |                |     7 |   217 |     7   (0)| 00:00:01 |
|*  4 |     MAT_VIEW ACCESS FULL   | MV_T_1         |    10 |   180 |     4   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL      | MLOG$_BASE_T_1 |     7 |    91 |     3   (0)| 00:00:01 |
|*  6 |    VIEW                    |                |     1 |    63 |     4  (25)| 00:00:01 |
|*  7 |     WINDOW SORT PUSHED RANK|                |     7 |   441 |     4  (25)| 00:00:01 |
|   8 |      TABLE ACCESS FULL     | MLOG$_BASE_T_1 |     7 |   441 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("PK_COL"="PK_COL")
   4 - filter("NUMBER_COL"=4)
   6 - filter("ROWN"=1 AND "OLD_NEW$$"='N' AND "NUMBER_COL"=4)
   7 - filter(ROW_NUMBER() OVER ( PARTITION BY "PK_COL" ORDER BY
              INTERNAL_FUNCTION("SEQUENCE$$") DESC )<=1)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

We’re no longer able to push our predicate into the mview log table, that’s because we have to get all the rows belonging to a particular pk_col value first. This means we do two large full table scans, one has to get sorted too. How does that compare to our alternative?

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------
Plan hash value: 3702722754

---------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                           |  1011 | 48528 |     8   (0)| 00:00:01 |
|   1 |  VIEW                       | FRESH_T_1_ALT             |  1011 | 48528 |     8   (0)| 00:00:01 |
|   2 |   TEMP TABLE TRANSFORMATION |                           |       |       |            |          |
|   3 |    LOAD AS SELECT           | SYS_TEMP_0FD9D6687_6B752F |       |       |            |          |
|*  4 |     VIEW                    |                           |     7 |   455 |     4  (25)| 00:00:01 |
|   5 |      WINDOW SORT            |                           |     7 |   441 |     4  (25)| 00:00:01 |
|   6 |       TABLE ACCESS FULL     | MLOG$_BASE_T_1            |     7 |   441 |     3   (0)| 00:00:01 |
|   7 |    UNION-ALL                |                           |       |       |            |          |
|*  8 |     FILTER                  |                           |       |       |            |          |
|*  9 |      HASH JOIN RIGHT OUTER  |                           |  1004 | 68272 |     6   (0)| 00:00:01 |
|  10 |       VIEW                  |                           |     7 |   350 |     2   (0)| 00:00:01 |
|  11 |        TABLE ACCESS FULL    | SYS_TEMP_0FD9D6687_6B752F |     7 |   364 |     2   (0)| 00:00:01 |
|  12 |       MAT_VIEW ACCESS FULL  | MV_T_1                    |  1000 | 18000 |     4   (0)| 00:00:01 |
|* 13 |     VIEW                    |                           |     7 |   364 |     2   (0)| 00:00:01 |
|  14 |      TABLE ACCESS FULL      | SYS_TEMP_0FD9D6687_6B752F |     7 |   364 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("ROWN"=1)
   8 - filter(NVL2("MV_LOG"."PK_COL","MV_LOG"."NUMBER_COL","MV"."NUMBER_COL")=4 AND
              ("MV_LOG"."PK_COL" IS NULL OR "MV_LOG"."OLD_NEW$$"='N'))
   9 - access("MV"."PK_COL"="MV_LOG"."PK_COL"(+))
  13 - filter("FIRST_OLDNEW"='N' AND "OLD_NEW$$"='N' AND "NUMBER_COL"=4)
Note
—–
– dynamic statistics used: dynamic sampling (level=2)
We do our one big WITH clause with the mview log (the same as before) and then we join on the pk_col value a little later again. You can appreciate that performance could be better as the big tablescan and sort is only done once, the later reads just use that materialized data set.
Our final query type would be extracting the entire view:
ANDY@orcl>explain plan for select * from fresh_t_1;

Explained.

Elapsed: 00:00:00.04
ANDY@orcl>@x

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------
Plan hash value: 1423242017

---------------------------------------------------------------------------------------------
| Id  | Operation                  | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                |  1001 | 48048 |    11  (10)| 00:00:01 |
|   1 |  VIEW                      | FRESH_T_1      |  1001 | 48048 |    11  (10)| 00:00:01 |
|   2 |   UNION-ALL                |                |       |       |            |          |
|*  3 |    HASH JOIN RIGHT ANTI NA |                |  1000 | 31000 |     7   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL      | MLOG$_BASE_T_1 |     7 |    91 |     3   (0)| 00:00:01 |
|   5 |     MAT_VIEW ACCESS FULL   | MV_T_1         |  1000 | 18000 |     4   (0)| 00:00:01 |
|*  6 |    VIEW                    |                |     1 |    63 |     4  (25)| 00:00:01 |
|*  7 |     WINDOW SORT PUSHED RANK|                |     7 |   441 |     4  (25)| 00:00:01 |
|   8 |      TABLE ACCESS FULL     | MLOG$_BASE_T_1 |     7 |   441 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("PK_COL"="PK_COL")
   6 - filter("ROWN"=1 AND "OLD_NEW$$"='N')
   7 - filter(ROW_NUMBER() OVER ( PARTITION BY "PK_COL" ORDER BY
              INTERNAL_FUNCTION("SEQUENCE$$") DESC )<=1)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

27 rows selected.

Elapsed: 00:00:00.28
ANDY@orcl>explain plan for select * from fresh_t_1_alt;

Explained.

Elapsed: 00:00:00.03
ANDY@orcl>@X

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------
Plan hash value: 3702722754

---------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                           |  1011 | 48528 |     8   (0)| 00:00:01 |
|   1 |  VIEW                       | FRESH_T_1_ALT             |  1011 | 48528 |     8   (0)| 00:00:01 |
|   2 |   TEMP TABLE TRANSFORMATION |                           |       |       |            |          |
|   3 |    LOAD AS SELECT           | SYS_TEMP_0FD9D6688_6B752F |       |       |            |          |
|*  4 |     VIEW                    |                           |     7 |   455 |     4  (25)| 00:00:01 |
|   5 |      WINDOW SORT            |                           |     7 |   441 |     4  (25)| 00:00:01 |
|   6 |       TABLE ACCESS FULL     | MLOG$_BASE_T_1            |     7 |   441 |     3   (0)| 00:00:01 |
|   7 |    UNION-ALL                |                           |       |       |            |          |
|*  8 |     FILTER                  |                           |       |       |            |          |
|*  9 |      HASH JOIN RIGHT OUTER  |                           |  1004 | 68272 |     6   (0)| 00:00:01 |
|  10 |       VIEW                  |                           |     7 |   350 |     2   (0)| 00:00:01 |
|  11 |        TABLE ACCESS FULL    | SYS_TEMP_0FD9D6688_6B752F |     7 |   364 |     2   (0)| 00:00:01 |
|  12 |       MAT_VIEW ACCESS FULL  | MV_T_1                    |  1000 | 18000 |     4   (0)| 00:00:01 |
|* 13 |     VIEW                    |                           |     7 |   364 |     2   (0)| 00:00:01 |
|  14 |      TABLE ACCESS FULL      | SYS_TEMP_0FD9D6688_6B752F |     7 |   364 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("ROWN"=1)
   8 - filter("MV_LOG"."PK_COL" IS NULL OR "MV_LOG"."OLD_NEW$$"='N')
   9 - access("MV"."PK_COL"="MV_LOG"."PK_COL"(+))
  13 - filter("FIRST_OLDNEW"='N' AND "OLD_NEW$$"='N')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
Again, I can appreciate the alternative view could be quicker as we do the big work just once.

Summary

We’ve managed to create two versions of a view that will provide Real Time Materialized View queries using versions of Oracle below 12.2. The two versions have their different performance profiles, it’s possible that some manual work can be done if you know your use case better.

Our final views:

create or replace view fresh_t_1 as
select pk_col, number_col, string_col 
from   mv_t_1
where  pk_col not in (select pk_col from mlog$_base_t_1)
union all
select pk_col, number_col, string_col
from  (
select pk_col, number_col, string_col
      ,row_number() over (partition by pk_col order by sequence$$ desc) rown 
      ,old_new$$
from mlog$_base_t_1
      )
where  rown = 1
and    old_new$$ = 'N';

create or replace view fresh_t_1_alt as
with mloglytic as (select pk_col, number_col, string_col, first_oldnew, old_new$$
                   from  (
                   select pk_col, number_col, string_col
                        ,row_number() over (partition by pk_col order by sequence$$ desc) rown 
                        ,max(old_new$$) keep (dense_rank first order by sequence$$) over (partition by pk_col) first_oldnew
                         ,old_new$$ old_new$$
                   from mlog$_base_t_1
                         )
                   where  rown = 1
                  )
select mv.pk_col, nvl2(mv_log.pk_col,mv_log.number_col,mv.number_col) number_col, nvl2(mv_log.pk_col,mv_log.string_col,mv.string_col) string_col
from   mv_t_1 mv
left join mloglytic mv_log
  on   mv.pk_col = mv_log.pk_col
where  mv_log.pk_col is null or mv_log.old_new$$ = 'N'
union all
select pk_col, number_col, string_col
from   mloglytic
where  first_oldnew = 'N'
and    old_new$$ = 'N';

In reviewing this post to put it live, I realised that actually another possibility would use the first part of the original view and the second part could use an existence check to check if the row in the mlog is the latest per PK. I believe this would allow for predicate pushing for both tables. I’ll post this revised solution when I have the time.

Part 3 may be a couple of weeks away, I’ll be sharing my solution for real time aggregate mviews – with and without a group by.

-Update 18/03/2017:

And here’s my solution which I believe gives the best performance is most situations, SQL and plan first:

create index mlog$_base_t_1_pk_seq on mlog$_base_t_1 (pk_col, sequence$$);
create or replace view fresh_t_1_perf as
select pk_col, number_col, string_col 
from   mv_t_1
where  pk_col not in (select pk_col from mlog$_base_t_1)
union all
select pk_col, number_col, string_col
from   mlog$_base_t_1 b1
where  not exists (select null from mlog$_base_t_1 ssq where ssq.pk_col = b1.pk_col and ssq.sequence$$ > b1.sequence$$)
and    old_new$$ = 'N';
explain plan for select * from fresh_t_1_perf where number_col between 50 and 51;

Plan hash value: 213855121

-------------------------------------------------------------------------------------------------
| Id  | Operation               | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                       |   316 | 15168 |    58   (0)| 00:00:01 |
|   1 |  VIEW                   | FRESH_T_1_PERF        |   316 | 15168 |    58   (0)| 00:00:01 |
|   2 |   UNION-ALL             |                       |       |       |            |          |
|*  3 |    HASH JOIN ANTI NA    |                       |   315 |  9765 |    35   (0)| 00:00:01 |
|*  4 |     MAT_VIEW ACCESS FULL| MV_T_1                |   326 |  5868 |    13   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL   | MLOG$_BASE_T_1        |  9284 |   117K|    22   (0)| 00:00:01 |
|   6 |    NESTED LOOPS ANTI    |                       |     1 |    89 |    23   (0)| 00:00:01 |
|*  7 |     TABLE ACCESS FULL   | MLOG$_BASE_T_1        |     1 |    63 |    22   (0)| 00:00:01 |
|*  8 |     INDEX RANGE SCAN    | MLOG$_BASE_T_1_PK_SEQ |     1 |    26 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("PK_COL"="PK_COL")
   4 - filter("NUMBER_COL">=50 AND "NUMBER_COL"<=51)
   7 - filter("OLD_NEW$$"='N' AND "NUMBER_COL">=50 AND "NUMBER_COL"<=51)
   8 - access("SSQ"."PK_COL"="B1"."PK_COL" AND "SSQ"."SEQUENCE$$">"B1"."SEQUENCE$$" AND
              "SSQ"."SEQUENCE$$" IS NOT NULL)
Here the optimizer has decided to opt for a full tablescan of our MView believing it will find enough rows to make a hash join against our MView Log worthwhile (full scanning will cost 22 as opposed to doing 326 index look ups). The second part of the union all is also doing a full scan of the MView Log, it’s pushed our main predicate and believes it will only have to check if one row from the MView log is the latest for that primary key (this time via a nested loop anti join).
I’ve managed to get 12.2 running on my sandbox setup at home and felt it was the perfect opportunity to do some comparisons. It turns out that this simple materialized view is not eligible for on query computation – or at least it wasn’t immediately available to me. Hopefully a fair competition is available when we do our aggregate materialized views.
Advertisements