我知道有以下方法可以选择一个表中存在但不存在于其他表中的值。
LEFT JOIN, NOT IN and NOT EXISTS
建议使用哪个选项?
可能没有一个普遍的答案 - 所以会很感激用例,其中每个都是可取的。
(我不是在寻找上述选项的语法 - 只是对方法的比较)
答案 0 :(得分:0)
简而言之,LEFT JOIN
将稍微的时间与其他两个相比。但NOT EXISTS
和NOT IN
几乎耗时。
当我需要在select子句中使用left join
表的值时,我更喜欢other
。否则我更喜欢not exists
。
我建议你在你的机器上复制下面的测试,因为我的机器是Oracle 12c
的家用机器,几乎没有其他任何东西在运行。可能在更大的环境中,测试会给出更准确的结果。
详细测试:
为了实际测试它,我将创建2个表并插入第一个表格,其中第一个表格为10百万行,第二个表格包含第一个表格中的其他条件,因此某些行不会插入第二个表格。
--Create first table
create table test_data_left (empno integer, ename varchar2(10),CONSTRAINT tdl_pk primary key(empno));
--PLSQL Block to enter 10 Million rows in test_data_left
declare v_max_empno integer;
BEGIN
select coalesce(max(EMPNO),0) into v_max_empno from emp_data;
FOR i IN 1..1000000 LOOP -- add 10 Million rows
insert into test_data_left(empno,ename) values (
i+v_max_empno,
DBMS_RANDOM.string('U',TRUNC(DBMS_RANDOM.value(10,11)))
);
END LOOP;
END;
/
commit;
--Create second table and populate with some condition to block some rows from first table
create table test_data_right (empno integer, ename varchar2(10),CONSTRAINT tdr_pk primary key(empno));
insert into test_data_right (empno,ename)
select empno,ename from test_data_left
where ename not like 'JK%';
这些是我用来获取数据的查询。
注意:我没有在select语句中使用t1.*
,因为SQL Developer只显示前50行,并且您无法在其上运行解释计划。因此我使用count(*)
select count(*) from test_data_left t1 left join test_data_right t2 on
t1.empno=t2.empno where t2.empno is nulll
select count(*) from test_data_left t1
where t1.empno not in (select empno from test_data_right);
select count(*) from test_data_left t1
where not exists (select 1 from test_data_right t2 where t1.empno=t2.empno);
为了收集上次运行查询的状态,我使用了这个命令。
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(null,null,'ALLSTATS LAST')) ;
为了小心,Oracle在计算时没有做任何有趣的事情,我在运行每个查询之前都重置了数据库连接。
以下是每次查询后的状态。我以相反的顺序重复它以给予LEFT JOIN
公平的机会。
据我所知,
LEFT JOIN
是最慢的NOT IN
和。NOT EXISTS
SQL_ID 0qz2qtza4yrr0, child number 0 ------------------------------------- select count(*) from test_data_left t1 left join test_data_right t2 on t1.empno=t2.empno where t2.empno is null Plan hash value: 2082679279 ------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.41 | 5012 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.41 | 5012 | | 2 | NESTED LOOPS ANTI | | 1 | 1206K| 900K|00:00:01.32 | 5012 | | 3 | INDEX FAST FULL SCAN| TDL_PK | 1 | 1206K| 1000K|00:00:00.22 | 1891 | |* 4 | INDEX UNIQUE SCAN | TDR_PK | 1000K| 1 | 99865 |00:00:00.54 | 3121 | ------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T1"."EMPNO"="T2"."EMPNO") Note ----- - dynamic statistics used: dynamic sampling (level=2)
几乎相同。 (基于我无法捕捉的几次迭代)
迭代1
LEFT JOIN
SQL_ID c498qdbzw5dxv, child number 0
-------------------------------------
select count(*) from test_data_left t1 where not exists (select 1 from
test_data_right t2 where t1.empno=t2.empno)
Plan hash value: 2082679279
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.27 | 5012 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.27 | 5012 |
| 2 | NESTED LOOPS ANTI | | 1 | 1206K| 900K|00:00:01.19 | 5012 |
| 3 | INDEX FAST FULL SCAN| TDL_PK | 1 | 1206K| 1000K|00:00:00.21 | 1891 |
|* 4 | INDEX UNIQUE SCAN | TDR_PK | 1000K| 1 | 99865 |00:00:00.49 | 3121 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T1"."EMPNO"="T2"."EMPNO")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
NOT EXISTS
SQL_ID gwm775xqnufgm, child number 0
-------------------------------------
select count(*) from test_data_left t1 where t1.empno not in (select
empno from test_data_right)
Plan hash value: 2082679279
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.23 | 5012 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.23 | 5012 |
| 2 | NESTED LOOPS ANTI | | 1 | 1206K| 900K|00:00:01.15 | 5012 |
| 3 | INDEX FAST FULL SCAN| TDL_PK | 1 | 1206K| 1000K|00:00:00.19 | 1891 |
|* 4 | INDEX UNIQUE SCAN | TDR_PK | 1000K| 1 | 99865 |00:00:00.47 | 3121 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T1"."EMPNO"="EMPNO")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
不在
SQL_ID gwm775xqnufgm, child number 0
-------------------------------------
select count(*) from test_data_left t1 where t1.empno not in (select
empno from test_data_right)
Plan hash value: 2082679279
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.19 | 5012 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.19 | 5012 |
| 2 | NESTED LOOPS ANTI | | 1 | 1206K| 900K|00:00:01.11 | 5012 |
| 3 | INDEX FAST FULL SCAN| TDL_PK | 1 | 1206K| 1000K|00:00:00.19 | 1891 |
|* 4 | INDEX UNIQUE SCAN | TDR_PK | 1000K| 1 | 99865 |00:00:00.46 | 3121 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T1"."EMPNO"="EMPNO")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
ITERATION 2
不在
SQL_ID c498qdbzw5dxv, child number 0
-------------------------------------
select count(*) from test_data_left t1 where not exists (select 1 from
test_data_right t2 where t1.empno=t2.empno)
Plan hash value: 2082679279
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.19 | 5012 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.19 | 5012 |
| 2 | NESTED LOOPS ANTI | | 1 | 1206K| 900K|00:00:01.12 | 5012 |
| 3 | INDEX FAST FULL SCAN| TDL_PK | 1 | 1206K| 1000K|00:00:00.19 | 1891 |
|* 4 | INDEX UNIQUE SCAN | TDR_PK | 1000K| 1 | 99865 |00:00:00.46 | 3121 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T1"."EMPNO"="T2"."EMPNO")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
NOT EXISTS
SQL_ID 0qz2qtza4yrr0, child number 0
-------------------------------------
select count(*) from test_data_left t1 left join test_data_right t2 on
t1.empno=t2.empno where t2.empno is null
Plan hash value: 2082679279
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.33 | 5012 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.33 | 5012 |
| 2 | NESTED LOOPS ANTI | | 1 | 1206K| 900K|00:00:01.24 | 5012 |
| 3 | INDEX FAST FULL SCAN| TDL_PK | 1 | 1206K| 1000K|00:00:00.22 | 1891 |
|* 4 | INDEX UNIQUE SCAN | TDR_PK | 1000K| 1 | 99865 |00:00:00.50 | 3121 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T1"."EMPNO"="T2"."EMPNO")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
LEFT JOIN
directive
答案 1 :(得分:-1)
这将从表a返回表b中没有相应记录的所有内容
SELECT a.col FROM a WHERE a.col NOT IN (SELECT b.col from b)
答案 2 :(得分:-1)
尝试以下查询
select tabA.* from tabA left join tabB on tabA.id = tabB.tabA_id
where tabB.tabA_id is null
希望它可以提供帮助。