带有过滤数据的FULL OUTER JOIN

时间:2013-11-29 14:00:27

标签: sql sql-server outer-join

我似乎在如何最好地解决这个要求方面遇到了一些困境。我意识到这个问题与以下其他问题密切相关:

增加的问题是,我想知道如何解决问题的一般意见。

IF OBJECT_ID('tempdb..#A') IS NOT NULL DROP TABLE #A
IF OBJECT_ID('tempdb..#B') IS NOT NULL DROP TABLE #B

GO
CREATE TABLE #A (key1   int NOT NULL PRIMARY KEY,
                 value1 int NOT NULL,
                 value2 int NOT NULL,
                 is_even AS (CASE WHEN key1 % 2 = 0 THEN 1 ELSE 0 END))

CREATE TABLE #B (key1   int NOT NULL PRIMARY KEY,
                 value1 int NOT NULL,
                 value2 int NOT NULL,
                 is_even AS (CASE WHEN key1 % 2 = 0 THEN 1 ELSE 0 END))

GO
-- dummy data
INSERT #A (key1, value1, value2)
SELECT TOP 10 key1   = ROW_NUMBER() OVER (ORDER BY x1.object_id),
              value1 = ROW_NUMBER() OVER (ORDER BY x1.object_id) % 7,
              value2 = ROW_NUMBER() OVER (ORDER BY x1.object_id) % 5
  FROM master.sys.objects x1, master.sys.objects x2, master.sys.objects x3

INSERT #B (key1, value1, value2)
SELECT key1, value1, value2
  FROM #A

GO
-- create holes but keep SOME overlap
DELETE #A WHERE value1 > value2 -- removes 3 records
DELETE #B WHERE value1 < value2 -- removes 3 records 

GO
-- show effect on tables
--SELECT * FROM #A ORDER BY key1
--SELECT * FROM #B ORDER BY key1

GO
-- create complete overview
SELECT key1 = ISNULL(a.key1, b.key1),
       value1a = a.value1, value2a = a.value2,
       value1b = b.value1, value2b = b.value2
  FROM #A a
  FULL OUTER JOIN #B b
               ON b.key1 = a.key1
 ORDER BY 1

GO
-- what if we only want the even records
-- THIS DOES NOT WORK !
SELECT key1 = ISNULL(a.key1, b.key1),
       value1a = a.value1, value2a = a.value2,
       value1b = b.value1, value2b = b.value2
  FROM #A a
  FULL OUTER JOIN #B b
               ON b.key1 = a.key1
              AND b.is_even = 1
 WHERE a.is_even = 1 
 ORDER BY 1

我知道为什么它不起作用;我只是想知道什么是最明确的方法,使其工作,并保持其他人可读。奖励积分如果它也适用于MSSQL以外的系统。

到目前为止,“我的”解决方案是:

通过OUTER效果捕获NULL:

SELECT key1 = ISNULL(a.key1, b.key1),
       value1a = a.value1, value2a = a.value2,
       value1b = b.value1, value2b = b.value2
  FROM #A a
  FULL OUTER JOIN #B b
               ON b.key1 = a.key1

 WHERE ISNULL(a.is_even, b.is_even) = 1
 ORDER BY 1

通过CTE

 ;WITH a (key1, value1, value2)
    AS (SELECT key1, value1, value2 
          FROM #A
         WHERE is_even = 1),
       b (key1, value1, value2)
    AS (SELECT key1, value1, value2 
          FROM #B
         WHERE is_even = 1)

 SELECT key1 = ISNULL(a.key1, b.key1),
       value1a = a.value1, value2a = a.value2,
       value1b = b.value1, value2b = b.value2
  FROM a
  FULL OUTER JOIN b
               ON b.key1 = a.key1
 ORDER BY 1

通过子查询

SELECT key1 = ISNULL(a.key1, b.key1),
       value1a = a.value1, value2a = a.value2,
       value1b = b.value1, value2b = b.value2
  FROM  (SELECT key1, value1, value2 
           FROM #A
          WHERE is_even = 1) a
  FULL OUTER JOIN (SELECT key1, value1, value2 
                     FROM #B
                    WHERE is_even = 1) b
               ON b.key1 = a.key1
 ORDER BY 1

虽然我更喜欢第一种解决方案,但CTE和/或子查询解决方案看起来更加明显,即使它们在代码中添加了大量的绒毛。 (而且我不太喜欢CTE =)

有什么意见吗?其他方案?备注(例如关于“真实”数据的表现)

1 个答案:

答案 0 :(得分:4)

您的两种方法“使用CTE”和“使用子查询”完全相同,只是个人偏好您将使用哪种方法。

所有3个查询都具有相同的估算成本和相同的I / O:

enter image description here

  

表'#B'。扫描计数1,逻辑读取2,物理读取0,预读取读取0,lob逻辑读取0,lob物理读取0,lob预读读取0。

     

表'#A'。扫描计数1,逻辑读取2,物理读取0,预读取读取0,lob逻辑读取0,lob物理读取0,lob预读读取0。

但是第一步还有一个额外的步骤Filter,因为子查询/ CTE方法能够在聚簇索引扫描的同时应用谓词is_even = 1

enter image description here

所以我会选择子查询方法或CTE方法,具体取决于您在视觉上的偏好。不要误以为在SQL方面总是更少,编写更详细的查询会更有效。