高效的表格比较

时间:2017-05-30 02:25:38

标签: sql oracle

我知道有以下方法可以选择一个表中存在但不存在于其他表中的值。

LEFT JOIN, NOT IN and NOT EXISTS

建议使用哪个选项?

可能没有一个普遍的答案 - 所以会很感激用例,其中每个都是可取的。

(我不是在寻找上述选项的语法 - 只是对方法的比较)

3 个答案:

答案 0 :(得分:0)

简而言之,LEFT JOIN稍微的时间与其他两个相比。但NOT EXISTSNOT IN几乎耗时。

当我需要在select子句中使用left join表的值时,我更喜欢other。否则我更喜欢not exists

我建议你在你的机器上复制下面的测试,因为我的机器是Oracle 12c的家用机器,几乎没有其他任何东西在运行。可能在更大的环境中,测试会给出更准确的结果。

详细测试:

为了实际测试它,我将创建2个表并插入第一个表格,其中第一个表格为10百万行,第二个表格包含第一个表格中的其他条件,因此某些行不会插入第二个表格。

--Create first table
create  table test_data_left (empno integer, ename varchar2(10),CONSTRAINT tdl_pk primary key(empno));

--PLSQL Block to enter 10 Million rows in test_data_left
declare v_max_empno integer;
BEGIN
select coalesce(max(EMPNO),0) into v_max_empno from  emp_data;
  FOR i IN 1..1000000 LOOP  -- add 10 Million rows
   insert into test_data_left(empno,ename) values (
   i+v_max_empno,
    DBMS_RANDOM.string('U',TRUNC(DBMS_RANDOM.value(10,11)))
  );   
  END LOOP;
END;
/
commit;

--Create second table and populate with some condition to block some rows from first table

create  table test_data_right (empno integer, ename varchar2(10),CONSTRAINT tdr_pk primary key(empno));

insert into  test_data_right (empno,ename)
select empno,ename from test_data_left
where ename not like 'JK%';

这些是我用来获取数据的查询。 注意:我没有在select语句中使用t1.*,因为SQL Developer只显示前50行,并且您无法在其上运行解释计划。因此我使用count(*)

select count(*)  from test_data_left t1 left join test_data_right t2 on 
t1.empno=t2.empno where t2.empno is nulll

select count(*) from test_data_left t1
where t1.empno not in (select empno from test_data_right);

select count(*) from test_data_left t1
where not exists (select 1 from test_data_right t2 where t1.empno=t2.empno);

为了收集上次运行查询的状态,我使用了这个命令。

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(null,null,'ALLSTATS LAST')) ;

为了小心,Oracle在计算时没有做任何有趣的事情,我在运行每个查询之前都重置了数据库连接。

以下是每次查询后的状态。我以相反的顺序重复它以给予LEFT JOIN公平的机会。

  

据我所知,LEFT JOIN是最慢的NOT IN和。NOT EXISTS   SQL_ID 0qz2qtza4yrr0, child number 0 ------------------------------------- select count(*) from test_data_left t1 left join test_data_right t2 on t1.empno=t2.empno where t2.empno is null Plan hash value: 2082679279 ------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.41 | 5012 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.41 | 5012 | | 2 | NESTED LOOPS ANTI | | 1 | 1206K| 900K|00:00:01.32 | 5012 | | 3 | INDEX FAST FULL SCAN| TDL_PK | 1 | 1206K| 1000K|00:00:00.22 | 1891 | |* 4 | INDEX UNIQUE SCAN | TDR_PK | 1000K| 1 | 99865 |00:00:00.54 | 3121 | ------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T1"."EMPNO"="T2"."EMPNO") Note ----- - dynamic statistics used: dynamic sampling (level=2) 几乎相同。 (基于我无法捕捉的几次迭代)

迭代1

LEFT JOIN

SQL_ID  c498qdbzw5dxv, child number 0
-------------------------------------
select count(*) from test_data_left t1 where not exists (select 1 from 
test_data_right t2    where t1.empno=t2.empno)

Plan hash value: 2082679279

-------------------------------------------------------------------------------------------
| Id  | Operation              | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |        |      1 |        |      1 |00:00:01.27 |    5012 |
|   1 |  SORT AGGREGATE        |        |      1 |      1 |      1 |00:00:01.27 |    5012 |
|   2 |   NESTED LOOPS ANTI    |        |      1 |   1206K|    900K|00:00:01.19 |    5012 |
|   3 |    INDEX FAST FULL SCAN| TDL_PK |      1 |   1206K|   1000K|00:00:00.21 |    1891 |
|*  4 |    INDEX UNIQUE SCAN   | TDR_PK |   1000K|      1 |  99865 |00:00:00.49 |    3121 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T1"."EMPNO"="T2"."EMPNO")

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

NOT EXISTS

SQL_ID  gwm775xqnufgm, child number 0
-------------------------------------
select count(*) from test_data_left t1 where t1.empno not in (select 
empno from test_data_right)

Plan hash value: 2082679279

-------------------------------------------------------------------------------------------
| Id  | Operation              | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |        |      1 |        |      1 |00:00:01.23 |    5012 |
|   1 |  SORT AGGREGATE        |        |      1 |      1 |      1 |00:00:01.23 |    5012 |
|   2 |   NESTED LOOPS ANTI    |        |      1 |   1206K|    900K|00:00:01.15 |    5012 |
|   3 |    INDEX FAST FULL SCAN| TDL_PK |      1 |   1206K|   1000K|00:00:00.19 |    1891 |
|*  4 |    INDEX UNIQUE SCAN   | TDR_PK |   1000K|      1 |  99865 |00:00:00.47 |    3121 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T1"."EMPNO"="EMPNO")

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

不在

SQL_ID  gwm775xqnufgm, child number 0
-------------------------------------
select count(*) from test_data_left t1 where t1.empno not in (select 
empno from test_data_right)

Plan hash value: 2082679279

-------------------------------------------------------------------------------------------
| Id  | Operation              | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |        |      1 |        |      1 |00:00:01.19 |    5012 |
|   1 |  SORT AGGREGATE        |        |      1 |      1 |      1 |00:00:01.19 |    5012 |
|   2 |   NESTED LOOPS ANTI    |        |      1 |   1206K|    900K|00:00:01.11 |    5012 |
|   3 |    INDEX FAST FULL SCAN| TDL_PK |      1 |   1206K|   1000K|00:00:00.19 |    1891 |
|*  4 |    INDEX UNIQUE SCAN   | TDR_PK |   1000K|      1 |  99865 |00:00:00.46 |    3121 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T1"."EMPNO"="EMPNO")

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

ITERATION 2

不在

SQL_ID  c498qdbzw5dxv, child number 0
-------------------------------------
select count(*) from test_data_left t1 where not exists (select 1 from 
test_data_right t2    where t1.empno=t2.empno)

Plan hash value: 2082679279

-------------------------------------------------------------------------------------------
| Id  | Operation              | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |        |      1 |        |      1 |00:00:01.19 |    5012 |
|   1 |  SORT AGGREGATE        |        |      1 |      1 |      1 |00:00:01.19 |    5012 |
|   2 |   NESTED LOOPS ANTI    |        |      1 |   1206K|    900K|00:00:01.12 |    5012 |
|   3 |    INDEX FAST FULL SCAN| TDL_PK |      1 |   1206K|   1000K|00:00:00.19 |    1891 |
|*  4 |    INDEX UNIQUE SCAN   | TDR_PK |   1000K|      1 |  99865 |00:00:00.46 |    3121 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T1"."EMPNO"="T2"."EMPNO")

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

NOT EXISTS

SQL_ID  0qz2qtza4yrr0, child number 0
-------------------------------------
select count(*)  from test_data_left t1 left join test_data_right t2 on 
t1.empno=t2.empno where t2.empno is null

Plan hash value: 2082679279

-------------------------------------------------------------------------------------------
| Id  | Operation              | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |        |      1 |        |      1 |00:00:01.33 |    5012 |
|   1 |  SORT AGGREGATE        |        |      1 |      1 |      1 |00:00:01.33 |    5012 |
|   2 |   NESTED LOOPS ANTI    |        |      1 |   1206K|    900K|00:00:01.24 |    5012 |
|   3 |    INDEX FAST FULL SCAN| TDL_PK |      1 |   1206K|   1000K|00:00:00.22 |    1891 |
|*  4 |    INDEX UNIQUE SCAN   | TDR_PK |   1000K|      1 |  99865 |00:00:00.50 |    3121 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T1"."EMPNO"="T2"."EMPNO")

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

LEFT JOIN

directive

答案 1 :(得分:-1)

这将从表a返回表b中没有相应记录的所有内容

SELECT a.col FROM a WHERE a.col NOT IN (SELECT b.col from b)

答案 2 :(得分:-1)

尝试以下查询

select tabA.* from tabA left join tabB on tabA.id = tabB.tabA_id
where tabB.tabA_id is null

希望它可以提供帮助。