Question

关注this question，假设现在我已经设置了索引，现在我只想返回某些字段，没有重复：

Select distinct A.cod 
  from A join B
       on A.id1=B.id1 and 
          A.id2=B.id2
 where A.year=2016
   and B.year=2016

现在问题是我得到的东西只有150k cod，只有1000个不同的值，所以我的查询效率非常低。

问题：我该如何改进？即，如果找到匹配项，我怎样才能告诉数据库，A上的每一行都要停止加入该行？

提前谢谢！

Answer 1

我的答案基于你的问题：

how can I tell the DB, for every row on A, to stop joining that row as soon as a match is found?

使用EXISTS子句，一旦看到匹配，它将停止并检查下一个要检查的记录。添加DISTINCT将过滤掉任何重复的COD（如果有的话）。

select DISTINCT cod
from   A ax
where  year = 2016
and    exists ( select 1
                from   B bx
                WHERE  Ax.ID1 = Bx.ID1
                AND Ax.ID2 = Bx.ID2
                AND Ax.YEAR = Bx.YEAR);

编辑：很好奇哪个解决方案（IN或EXISTS）会给我一个更好的解释计划

创建第一个表定义创建表A. （ ID1号码， ID2号码， cod varchar2（100），
年份数）;

插入4000000个连续数字

BEGIN

    FOR i IN 1..4000000 loop

        insert into A (id1, id2, cod, year)
        values (i, i , i, i);

    end loop;

END;

commit;

创建表B并将相同的数据插入其中

Create table B
as
select  *
from    A;

重新插入表A中的数据以制作副本

insert into B
select  *
from    A

构建上一篇文章Index on join and where

中提到的索引

CREATE INDEX A_IDX ON A(year, id1, id2);
CREATE INDEX B_IDX ON B(year, id1, id2);

更新一堆行，使其获取2016年的多行：

update B
set   year = 2016
where rownum < 20000;

update A
set   year = 2016
where rownum < 20000;

commit;

使用EXISTS

检查解释计划

Plan hash value: 1052726981

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |     1 |    44 |     7  (15)| 00:00:01 |
|   1 |  HASH UNIQUE                  |              |     1 |    44 |     7  (15)| 00:00:01 |
|   2 |   NESTED LOOPS SEMI           |              |     1 |    44 |     6   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| A     |     1 |    26 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | A_IDX |     1 |       |     3   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN           | B_IDX |     2 |    36 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("YEAR"=2016)
   5 - access("BX"."YEAR"=2016 AND "AX"."ID1"="BX"."ID1" AND "AX"."ID2"="BX"."ID2")
       filter("AX"."YEAR"="BX"."YEAR")

使用IN

检查解释计划

Plan hash value: 3002464630

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |     1 |    44 |     7  (15)| 00:00:01 |
|   1 |  HASH UNIQUE                  |              |     1 |    44 |     7  (15)| 00:00:01 |
|   2 |   NESTED LOOPS                |              |     1 |    44 |     6   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| A     |     1 |    26 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | A_IDX |     1 |       |     3   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN           | B_IDX |     1 |    18 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("YEAR"=2016)
5 - access("YEAR"=2016 AND "ID1"="ID1" AND "ID2"="ID2")

虽然我的测试用例有限，但我猜测IN和EXISTS子句的执行几乎相同。

Answer 2

从表面上看，你实际上要做的事情应该是这样做的：

select distinct cod
from   A
where  year = 2016
  and  (id1, id2) in (select id1, id2 from B where year = 2016)

WHERE条件中的子查询是一个非相关查询，因此它只会被评估一次。并且使用短路评估IN条件;而不是完整的连接，它将仅搜索子查询的结果，直到找到匹配。

编辑：正如Migs Isip指出的那样，原始表格中可能存在重复的代码，因此＆＃34; distinct＆＃34;可能仍然需要。我编辑了我的代码，在Migs发布他的答案之后将其添加回来。

Answer 3

不确定您现有的索引，但您可以通过添加另一个JOIN条件来改善您的查询，例如

Select distinct A.cod 
  from A join B
       on A.id1=B.id1 and 
          A.id2=B.id2 and
          A.year = B.year  // this one
 where A.year=2016;

大型连接的独特条款

3 个答案: