使用子查询加入消除不在Oracle中的工作

时间:2016-11-08 15:04:12

标签: sql database oracle cost-based-optimizer anchor-modeling

我能够将连接消除工作用于简单的情况,例如一对一的关系,但不能用于稍微复杂的场景。 最后我想尝试锚建模,但首先我需要找到解决这个问题的方法。我使用的是Oracle 12c企业版第12.1.0.2.0版。

我的测试用例的DDL:

drop view product_5nf;
drop table product_color cascade constraints;
drop table product_price cascade constraints;
drop table product       cascade constraints;

create table product(
   product_id number not null
  ,constraint product_pk primary key(product_id)
);

create table product_color(
   product_id  number         not null references product
  ,color       varchar2(10)   not null
  ,constraint product_color_pk primary key(product_id)
);

create table product_price(
   product_id  number   not null references product
  ,from_date   date     not null
  ,price       number   not null
  ,constraint product_price_pk primary key(product_id, from_date)
);

一些示例数据:

insert into product values(1);
insert into product values(2);
insert into product values(3);
insert into product values(4);

insert into product_color values(1, 'Red');
insert into product_color values(2, 'Green');

insert into product_price values(1, date '2016-01-01', 10);
insert into product_price values(1, date '2016-02-01', 8);
insert into product_price values(1, date '2016-05-01', 5);

insert into product_price values(2, date '2016-02-01', 5);

insert into product_price values(4, date '2016-01-01', 10);

commit;

5NF视图

第一个视图无法编译 - 它因ORA-01799而失败:列可能无法外部连接到子查询。不幸的是,当我查看锚建模的在线示例时,这就是大多数历史视图的定义...

create view product_5nf as
   select p.product_id
         ,pc.color
         ,pp.price 
     from product p
     left join product_color pc on(
          pc.product_id = p.product_id
     )
     left join product_price pp on(
          pp.product_id = p.product_id
      and pp.from_date  = (select max(pp2.from_date) 
                             from product_price pp2 
                            where pp2.product_id = pp.product_id)
     );

以下是我修复它的尝试。通过简单选择product_id使用此视图时,Oracle设法消除product_color,但 product_price。

create view product_5nf as
   select product_id
         ,pc.color
         ,pp.price 
     from product p
     left join product_color pc using(product_id)
     left join (select pp1.product_id, pp1.price 
                  from product_price pp1
                 where pp1.from_date  = (select max(pp2.from_date) 
                                           from product_price pp2 
                                          where pp2.product_id = pp1.product_id)
              )pp using(product_id);

select product_id
  from product_5nf;

----------------------------------------------------------
| Id  | Operation             | Name             | Rows  |
----------------------------------------------------------
|   0 | SELECT STATEMENT      |                  |     4 |
|*  1 |  HASH JOIN OUTER      |                  |     4 |
|   2 |   INDEX FAST FULL SCAN| PRODUCT_PK       |     4 |
|   3 |   VIEW                |                  |     3 |
|   4 |    NESTED LOOPS       |                  |     3 |
|   5 |     VIEW              | VW_SQ_1          |     5 |
|   6 |      HASH GROUP BY    |                  |     5 |
|   7 |       INDEX FULL SCAN | PRODUCT_PRICE_PK |     5 |
|*  8 |     INDEX UNIQUE SCAN | PRODUCT_PRICE_PK |     1 |
----------------------------------------------------------

我找到的唯一解决方案是使用标量子查询,如下所示:

create or replace view product_5nf as
   select p.product_id
         ,pc.color
         ,(select pp.price
             from product_price pp
            where pp.product_id = p.product_id
              and pp.from_date = (select max(from_date)
                                    from product_price pp2
                                   where pp2.product_id = pp.product_id)) as price
     from product p
     left join product_color pc on(
          pc.product_id = p.product_id
     )

select product_id
  from product_5nf;

---------------------------------------------------
| Id  | Operation            | Name       | Rows  |
---------------------------------------------------
|   0 | SELECT STATEMENT     |            |     4 |
|   1 |  INDEX FAST FULL SCAN| PRODUCT_PK |     4 |
---------------------------------------------------

现在Oracle成功地删除了product_price表。但是,标量子查询的实现方式与连接不同,执行它们的方式并不能让我在现实场景中获得任何可接受的性能。

TL; DR 如何重写视图product_5nf以便Oracle成功地消除了两个相关表?

5 个答案:

答案 0 :(得分:4)

我认为你在这里遇到两个问题。

首先,连接消除仅适用于特定情况(PK-PK,PK-FK等)。一般情况下,您可以LEFT JOIN到任何行集,为每个连接键值返回一行并让Oracle取消连接。

其次,即使Oracle已经足够先进,可以在任何LEFT JOIN上进行连接消除,而且它知道每个连接键值只能获得一行,Oracle还不支持LEFT JOINS上的连接消除基于复合密钥(Oracle支持文档887553.1表示这将在R12.2中出现)。

您可以考虑的一种解决方法是使用每个product_id的最后一行来实现视图。然后LEFT JOIN到物化视图。像这样:

create table product(
   product_id number not null
  ,constraint product_pk primary key(product_id)
);

create table product_color(
   product_id  number         not null references product
  ,color       varchar2(10)   not null
  ,constraint product_color_pk primary key(product_id)
);

create table product_price(
   product_id  number   not null references product
  ,from_date   date     not null
  ,price       number   not null
  ,constraint product_price_pk  primary key (product_id, from_date )
);

-- Add a VIRTUAL column to PRODUCT_PRICE so that we can get all the data for 
-- the latest row by taking the MAX() of this column.
alter table product_price add ( sortable_row varchar2(80) generated always as ( lpad(product_id,10,'0') || to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,'0'))  virtual not null );

-- Create a MV snapshot so we can materialize a view having only the latest
-- row for each product_id and can refresh that MV fast on commit.
create materialized view log on product_price with sequence, primary key, rowid ( price  ) including new values;

-- Create the MV
create materialized view product_price_latest refresh fast on commit enable query rewrite as
SELECT product_id, max( lpad(product_id,10,'0') || to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,'0')) sortable_row
FROM   product_price
GROUP BY product_id;

-- Create a primary key on the MV, so we can do join elimination
alter table product_price_latest add constraint ppl_pk primary key ( product_id );

-- Insert the OP's test data
insert into product values(1);
insert into product values(2);
insert into product values(3);
insert into product values(4);

insert into product_color values(1, 'Red');
insert into product_color values(2, 'Green');

insert into product_price ( product_id, from_date, price ) values(1, date '2016-01-01', 10 );
insert into product_price ( product_id, from_date, price) values(1, date '2016-02-01', 8);
insert into product_price ( product_id, from_date, price) values(1, date '2016-05-01', 5);

insert into product_price ( product_id, from_date, price) values(2, date '2016-02-01', 5);

insert into product_price ( product_id, from_date, price) values(4, date '2016-01-01', 10);

commit;

-- Create the 5NF view using the materialized view
create or replace view product_5nf as
   select p.product_id
         ,pc.color
         ,to_date(substr(ppl.sortable_row,11,14),'YYYYMMDDHH24MISS') from_date
         ,to_number(substr(ppl.sortable_row,25)) price 
     from product p
     left join product_color pc on pc.product_id = p.product_id
     left join product_price_latest ppl on ppl.product_id = p.product_id 
;

-- The plan for this should not include any of the unnecessary tables.
select product_id from product_5nf;

-- Check the plan
SELECT *
FROM   TABLE (DBMS_XPLAN.display_cursor (null, null,
                                         'ALLSTATS LAST'));

------------------------------------------------
| Id  | Operation        | Name       | E-Rows |
------------------------------------------------
|   0 | SELECT STATEMENT |            |        |
|   1 |  INDEX FULL SCAN | PRODUCT_PK |      1 |
------------------------------------------------

答案 1 :(得分:2)

我无法取消价格加入,但如果您执行以下操作,则至少可以减少对单个索引的访问以进行价格检查:

CREATE OR REPLACE view product_5nf as
select p.product_id
      ,pc.color
      ,pp.price 
 from product p
 left join product_color pc ON p.product_id = pc.product_id
 left join (select pp1.product_id, pp1.price 
              from (SELECT product_id,
                           price,
                           from_date,
                           max(from_date) OVER (PARTITION BY product_id) max_from_date
                    FROM   product_price) pp1
             where pp1.from_date = max_from_date) pp ON p.product_id = pp.product_id;

答案 2 :(得分:1)

  
    

现在Oracle成功地删除了product_price表。但是,标量子查询的实现方式与连接不同,执行它们的方式根本不允许我在现实场景中获得任何可接受的性能。

  

Oracle 12.1中基于成本的优化器可以对不需要的标量子查询执行查询转换。因此,性能可能与您在问题中所追求的LEFT JOIN一样好。

诀窍是你必须稍微摇晃它。

首先,确保标量子查询返回max()而没有group by,因此CBO知道没有机会获得多行。 (否则不会被取消)。

其次,您需要将product_price中的所有字段组合到单个标量子查询中,否则CBO将被取消并多次加入product_price

以下是Oracle 12.1的测试用例,说明了这一点。

drop view product_5nf;
drop table product_color cascade constraints;
drop table product_price cascade constraints;
drop table product       cascade constraints;


create table product(
   product_id number not null
  ,constraint product_pk primary key(product_id)
);

create table product_color(
   product_id  number         not null references product
  ,color       varchar2(10)   not null
  ,constraint product_color_pk primary key(product_id)
);

create table product_price(
   product_id  number   not null references product
  ,from_date   date     not null
  ,price       number   not null
  ,constraint product_price_pk  primary key (product_id, from_date )
);

insert into product ( product_id ) SELECT rownum FROM dual connect by rownum <= 100000;

insert into product_color ( product_id, color ) SELECT rownum, dbms_random.string('a',8) color FROM DUAL connect by rownum <= 100000;

--delete from product_price;
insert into product_price ( product_id, from_date, price ) SELECT product_id, trunc(sysdate) + dbms_random.value(-3,3) from_date, floor(dbms_random.value(50,120)/10)*10 price from product cross join lateral ( SELECT rownum x FROM dual connect by rownum <= mod(product_id,5));

commit;

begin dbms_stats.gather_table_stats ( ownname => USER, tabname => 'PRODUCT' ); end; 
begin dbms_stats.gather_table_stats ( ownname => USER, tabname => 'PRODUCT_COLOR' ); end; 
begin dbms_stats.gather_table_stats ( ownname => USER, tabname => 'PRODUCT_PRICE' ); end; 

commit;

alter table product_price add ( composite_column varchar2(80) generated always as ( to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,0)) virtual );

create or replace view product_5nf as
   select d.product_id, d.color, to_date(substr(d.product_date_price,1,14),'YYYYMMDDHH24MISS') from_date, to_number(substr(d.product_date_price,-10)) price 
from 
(    select p.product_id
         ,pc.color
         ,( SELECT max(composite_column)  FROM product_price pp WHERE pp.product_id = p.product_id AND pp.from_date = ( SELECT max(pp2.from_date) FROM product_price pp2 WHERE pp2.product_id = pp.product_id ) ) product_date_price
     from product p
     left join product_color pc on pc.product_id = p.product_id )  d
;

select product_id from product_5nf;

----------------------------------------------
| Id  | Operation         | Name    | E-Rows |
----------------------------------------------
|   0 | SELECT STATEMENT  |         |        |
|   1 |  TABLE ACCESS FULL| PRODUCT |    100K|
----------------------------------------------

select * from product_5nf;

SELECT *
FROM   TABLE (DBMS_XPLAN.display_cursor (null, null,
                                         'ALLSTATS LAST'));

--------------------------------------------------------------------------------------
| Id  | Operation                | Name          | E-Rows |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |               |        |       |       |          |
|*  1 |  HASH JOIN RIGHT OUTER   |               |    100K|  8387K|  3159K| 8835K (0)|
|   2 |   VIEW                   | VW_SSQ_2      |      2 |       |       |          |
|   3 |    HASH GROUP BY         |               |      2 |    13M|  2332K|   12M (0)|
|   4 |     VIEW                 | VM_NWVW_3     |      2 |       |       |          |
|*  5 |      FILTER              |               |        |       |       |          |
|   6 |       HASH GROUP BY      |               |      2 |    23M|  5055K|   20M (0)|
|*  7 |        HASH JOIN         |               |    480K|    12M|  4262K|   17M (0)|
|   8 |         TABLE ACCESS FULL| PRODUCT_PRICE |    220K|       |       |          |
|   9 |         TABLE ACCESS FULL| PRODUCT_PRICE |    220K|       |       |          |
|* 10 |   HASH JOIN OUTER        |               |    100K|  5918K|  3056K| 5847K (0)|
|  11 |    TABLE ACCESS FULL     | PRODUCT       |    100K|       |       |          |
|  12 |    TABLE ACCESS FULL     | PRODUCT_COLOR |    100K|       |       |          |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ITEM_2"="P"."PRODUCT_ID")
   5 - filter("PP"."FROM_DATE"=MAX("PP2"."FROM_DATE"))
   7 - access("PP2"."PRODUCT_ID"="PP"."PRODUCT_ID")
  10 - access("PC"."PRODUCT_ID"="P"."PRODUCT_ID")

答案 3 :(得分:0)

好的,我回答了我自己的问题。此答案中的信息适用于 Oracle Database 12c企业版12.1.0.2.0 - 64位生产,但可能不适用于以用于更高版本。 不要投票这个答案,因为它没有回答这个问题。

由于当前版本的特定限制(如Mathew McPeak所述),根本不可能让Oracle完全消除底层5NF视图中不必要的连接。限制是在基于复合键的左连接上不能进行连接消除。

任何解决此限制的尝试似乎都会引入重复或更新异常。接受的答案演示了如何通过使用物化视图从而复制数据来克服优化器中的此限制。这个答案显示了如何解决问题,减少重复,但更新异常。

此解决方法基于以下事实:您可以在唯一索引中使用可为空的列。我们将null用于所有历史版本,将实际product_id用于引用带有外键的产品表的最新版本。

alter table product_price add(
   latest_id number
  ,constraint product_price_uk  unique(latest_id)
  ,constraint product_price_fk2 foreign key(latest_id) references product(product_id)
  ,constraint product_price_chk check(latest_id = product_id)
);

-- One-time update of existing data
update product_price a
   set a.latest_id = a.product_id
 where from_date = (select max(from_date) 
                      from product_price b 
                     where a.product_id = b.product_id);   

PRODUCT_ID FROM_DATE       PRICE  LATEST_ID
---------- ---------- ---------- ----------
         1 2016-01-01         10       null
         1 2016-02-01          8       null
         1 2016-05-01          5          1
         2 2016-02-01          5          2
         4 2016-01-01         10          4

-- New view definition             
create or replace view product_5nf as
   select p.product_id
         ,pc.color
         ,pp.price
     from product p
     left join product_color pc on(pc.product_id = p.product_id)
     left join product_price pp on(pp.latest_id  = p.product_id);

当然,现在必须手动维护latest_id ...每当插入新记录时,必须首先使用null更新旧记录。

这种方法有两个好处。首先,Oracle能够完全删除不必要的连接。其次,连接不是作为标量子查询执行的。

SQL> select count(*) from product_5nf;

---------------------------------------
| Id  | Operation        | Name       |
---------------------------------------
|   0 | SELECT STATEMENT |            |
|   1 |  SORT AGGREGATE  |            |
|   2 |   INDEX FULL SCAN| PRODUCT_PK |
---------------------------------------

Oracle认识到可以在不触及基表的情况下解析计数。没有不必要的联接可见......

SQL> select product_id, price from product_5nf;

---------------------------------------------------------
| Id  | Operation                    | Name             |
---------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |
|*  1 |  HASH JOIN OUTER             |                  |
|   2 |   INDEX FULL SCAN            | PRODUCT_PK       |
|   3 |   TABLE ACCESS BY INDEX ROWID| PRODUCT_PRICE    |
|*  4 |    INDEX FULL SCAN           | PRODUCT_PRICE_UK |
---------------------------------------------------------

Oracle认识到我们必须加入product_price才能获得价格列。 product_color无处可见......

SQL> select * from product_5nf;

----------------------------------------------------------
| Id  | Operation                     | Name             |
----------------------------------------------------------
|   0 | SELECT STATEMENT              |                  |
|*  1 |  HASH JOIN OUTER              |                  |
|   2 |   NESTED LOOPS OUTER          |                  |
|   3 |    INDEX FULL SCAN            | PRODUCT_PK       |
|   4 |    TABLE ACCESS BY INDEX ROWID| PRODUCT_COLOR    |
|*  5 |     INDEX UNIQUE SCAN         | PRODUCT_COLOR_PK |
|   6 |   TABLE ACCESS BY INDEX ROWID | PRODUCT_PRICE    |
|*  7 |    INDEX FULL SCAN            | PRODUCT_PRICE_UK |
----------------------------------------------------------

此处Oracle必须实现所有连接,因为所有列都被引用。

答案 4 :(得分:0)

[我不知道ANTI-JOIN是否算作Oracle中的子查询],但not exists技巧通常是一种避免聚合子查询的方法:

CREATE VIEW product_5nfa as
   SELECT p.product_id
         ,pc.color
         ,pp.price
     FROM product p
     LEFT JOIN product_color pc
        ON pc.product_id = p.product_id
     LEFT join product_price pp
        ON pp.product_id = p.product_id
        AND NOT EXISTS ( SELECT * FROM product_price pp2
            WHERE pp2.product_id = pp.product_id
            AND pp2.from_date  > pp.from_date
            )   
     ;

来自OP的评论:视图已创建,但Oracle仍无法删除该连接。这是执行计划。

select count(*) from product_5nfa;

-------------------------------------------------
| Id  | Operation            | Name             |
-------------------------------------------------
|   0 | SELECT STATEMENT     |                  |
|   1 |  SORT AGGREGATE      |                  |
|   2 |   NESTED LOOPS OUTER |                  |
|   3 |    INDEX FULL SCAN   | PRODUCT_PK       |
|   4 |    VIEW              |                  |
|   5 |     NESTED LOOPS ANTI|                  |
|*  6 |      INDEX RANGE SCAN| PRODUCT_PRICE_PK |
|*  7 |      INDEX RANGE SCAN| PRODUCT_PRICE_PK |
-------------------------------------------------