关注this question,假设现在我已经设置了索引,现在我只想返回某些字段,没有重复:
Select distinct A.cod
from A join B
on A.id1=B.id1 and
A.id2=B.id2
where A.year=2016
and B.year=2016
现在问题是我得到的东西只有150k cod
,只有1000个不同的值,所以我的查询效率非常低。
问题:我该如何改进?即,如果找到匹配项,我怎样才能告诉数据库,A
上的每一行都要停止加入该行?
提前谢谢!
答案 0 :(得分:2)
我的答案基于你的问题:
how can I tell the DB, for every row on A, to stop joining that row as soon as a match is found?
使用EXISTS
子句,一旦看到匹配,它将停止并检查下一个要检查的记录。
添加DISTINCT
将过滤掉任何重复的COD(如果有的话)。
select DISTINCT cod
from A ax
where year = 2016
and exists ( select 1
from B bx
WHERE Ax.ID1 = Bx.ID1
AND Ax.ID2 = Bx.ID2
AND Ax.YEAR = Bx.YEAR);
编辑:很好奇哪个解决方案(IN
或EXISTS
)会给我一个更好的解释计划
创建第一个表定义
创建表A.
(
ID1号码,
ID2号码,
cod varchar2(100),
年份数
);
插入4000000个连续数字
BEGIN
FOR i IN 1..4000000 loop
insert into A (id1, id2, cod, year)
values (i, i , i, i);
end loop;
END;
commit;
创建表B并将相同的数据插入其中
Create table B
as
select *
from A;
重新插入表A中的数据以制作副本
insert into B
select *
from A
构建上一篇文章Index on join and where
中提到的索引CREATE INDEX A_IDX ON A(year, id1, id2);
CREATE INDEX B_IDX ON B(year, id1, id2);
更新一堆行,使其获取2016年的多行:
update B
set year = 2016
where rownum < 20000;
update A
set year = 2016
where rownum < 20000;
commit;
使用EXISTS
Plan hash value: 1052726981
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 44 | 7 (15)| 00:00:01 |
| 1 | HASH UNIQUE | | 1 | 44 | 7 (15)| 00:00:01 |
| 2 | NESTED LOOPS SEMI | | 1 | 44 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| A | 1 | 26 | 4 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | A_IDX | 1 | | 3 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | B_IDX | 2 | 36 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("YEAR"=2016)
5 - access("BX"."YEAR"=2016 AND "AX"."ID1"="BX"."ID1" AND "AX"."ID2"="BX"."ID2")
filter("AX"."YEAR"="BX"."YEAR")
使用IN
Plan hash value: 3002464630
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 44 | 7 (15)| 00:00:01 |
| 1 | HASH UNIQUE | | 1 | 44 | 7 (15)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 44 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| A | 1 | 26 | 4 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | A_IDX | 1 | | 3 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | B_IDX | 1 | 18 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("YEAR"=2016)
5 - access("YEAR"=2016 AND "ID1"="ID1" AND "ID2"="ID2")
虽然我的测试用例有限,但我猜测IN
和EXISTS
子句的执行几乎相同。
答案 1 :(得分:0)
从表面上看,你实际上要做的事情应该是这样做的:
select distinct cod
from A
where year = 2016
and (id1, id2) in (select id1, id2 from B where year = 2016)
WHERE条件中的子查询是一个非相关查询,因此它只会被评估一次。并且使用短路评估IN条件;而不是完整的连接,它将仅搜索子查询的结果,直到找到匹配。
编辑:正如Migs Isip指出的那样,原始表格中可能存在重复的代码,因此&#34; distinct&#34;可能仍然需要。我编辑了我的代码,在Migs发布他的答案之后将其添加回来。答案 2 :(得分:-1)
不确定您现有的索引,但您可以通过添加另一个JOIN
条件来改善您的查询,例如
Select distinct A.cod
from A join B
on A.id1=B.id1 and
A.id2=B.id2 and
A.year = B.year // this one
where A.year=2016;