我环顾四周,虽然我觉得我的问题是基本的,但我没有看到任何直接解决它的问题:在子查询中使用DISTINCT与在最终的SELECT语句中指定它有什么影响?为什么?
给定两个表TABLE_A和TABLE_B,每个表都有一个唯一的变量和两个索引,INDEX_ONE和INDEX_TWO,分别大约有5000万和50,000行... DISTINCT的使用更经济?
SELECT /*+ USE_HASH(A B) LEADING(B A) ALL_ROWS */
DISTINCT
INDEX_ONE,
INDEX_TWO,
VARIABLE_A,
VARIABLE_B
FROM (SELECT
INDEX_ONE,
INDEX_TWO,
VARIABLE_A
FROM
TABLE_A) A
INNER JOIN
(SELECT
INDEX_ONE,
INDEX_TWO,
VARIABLE_B
FROM
TABLE_B) B
ON A.INDEX_ONE = B.INDEX_ONE
AND A.INDEX_TWO = B.INDEX_TWO
或者
SELECT /*+ USE_HASH(A B) LEADING(B A) ALL_ROWS */
INDEX_ONE,
INDEX_TWO,
VARIABLE_A,
VARIABLE_B
FROM (SELECT DISTINCT
INDEX_ONE,
INDEX_TWO,
VARIABLE_A
FROM
TABLE_A) A
INNER JOIN
(SELECT DISTINCT
INDEX_ONE,
INDEX_TWO,
VARIABLE_B
FROM
TABLE_B) B
ON A.INDEX_ONE = B.INDEX_ONE
AND A.INDEX_TWO = B.INDEX_TWO
也有兴趣知道是否有比这里更快的方式,特别是为什么。
编辑: 在查看了keiv.fly的评论后,我也会抛出这个评论:
SELECT /*+ USE_HASH(A B) LEADING(B A) ALL_ROWS */
DISTINCT
INDEX_ONE,
INDEX_TWO,
VARIABLE_A,
VARIABLE_B
FROM TABLE A A
INNER JOIN TABLE_B B
ON A.INDEX_ONE = B.INDEX_ONE
AND A.INDEX_TWO = B.INDEX_TWO