从重复键匹配的连接中排除结果

时间:2010-10-19 19:34:08

标签: sql sql-server tsql

我处理来自多个我无法控制的来源的数据。这些来源往往在“关键”值中有重复。我需要在连接中保持这些重复值中的任何一个匹配。

使用以下数据

T1
| ID | FirstKey | SecondKey | ThirdKey | AdditionalColumns   |
+----+----------+-----------+----------+---------------------+
| 01 | Prod1    | ABC1      | 201      | Jun 2010, A, 101    |
| 02 | Prod2    | DEF2      | 202      | May 2009, A, 101    |
| 03 | Prod2    | DEF2      | 202      | May 2010, S, 101    |
| 04 | Prod3    |           | 206      | Jun 2010, A, 103    |
| 05 | Prod4    |           | 207      | Jun 2011, S, 103    |


T2
| ID | FirstKey | SecondKey | ThirdKey | AdditionalColumns   |
+----+----------+-----------+----------+---------------------+
| 01 | Prod1    | ABC1      | 201      | Jun 2010, A, 101    |
| 02 | Prod2    | DEF2      |          | May 2009, A, 101    |
| 03 | Prod2    | DEF2      | 202      | May 2010, S, 101    |
| 04 | Prod3    |           |          | Jun 2010, A, 103    |
| 05 | Prod4    |           | 207      | Jun 2011, S, 103    |
| 06 | Prod1    | ABC1      | 201      | Jun 2010, T, 101    |

现在,如果我们进行查询:

SELECT 
       T1.FirstKey, T1.SecondKey, T1.ThirdKey,
       T2.FirstKey, T2.SecondKey, T2.ThirdKey,
       T1.AdditionalColumns, T2.AdditionalColumns
FROM 
       T1 JOIN T2 ON T1.FirstKey = T2.FirstKey 
          AND T1.SecondKey = T2.SecondKey
          AND T1.SecondKey IS NOT NULL
UNION
SELECT 
       T1.FirstKey, T1.SecondKey, T1.ThirdKey,
       T2.FirstKey, T2.SecondKey, T2.ThirdKey, 
       T1.AdditionalColumns, T2.AdditionalColumns
FROM 
       T1 JOIN T2 ON T1.FirstKey = T2.FirstKey 
          AND T1.ThirdKey = T2.ThirdKey
          AND T1.SecondKey IS NULL

我们得到以下结果

FirstKey  SecondKey  ThirdKey  FirstKey  SecondKey  ThirdKey  AdditionalColumns  AdditionalColumns
--------  ---------  --------  --------  ---------  --------  -----------------  -----------------
Prod1     ABC1       201       Prod1     ABC1       201       Jun 2010, A, 101   Jun 2010, A, 101
Prod1     ABC1       201       Prod1     ABC1       201       Jun 2010, A, 101   Jun 2010, T, 101
Prod2     DEF2       202       Prod2     DEF2       202       May 2009, A, 101   May 2010, S, 101
Prod2     DEF2       202       Prod2     DEF2       202       May 2010, S, 101   May 2010, S, 101
Prod4     NULL       207       Prod4     NULL       207       Jun 2011, S, 103   Jun 2011, A, 103

我需要查询只返回权威匹配的记录。例如表之间只有1个匹配。

FirstKey  SecondKey  ThirdKey  FirstKey  SecondKey  ThirdKey  AdditionalColumns  AdditionalColumns
--------  ---------  --------  --------  ---------  --------  -----------------  -----------------
Prod4     NULL       207       Prod4     NULL       207       Jun 2011, S, 103   Jun 2011, A, 103

有没有办法在JOIN中执行此操作?

目前,我可以通过为每个表创建CTE来保证唯一性,这样可以保证连接中使用的键的唯一性。这很有效,但很难看,并为查询添加了重要的工作。

是否有另一种方法可以执行此连接以排除重复的匹配?这假设我无法根据AdditionalColumns数据以编程方式排除任何重复行。

我一遍又一遍地遇到这种情况所以CTE方法似乎只是一个问题,因为它必须是一个已经解决的问题。

2 个答案:

答案 0 :(得分:1)

如何在查询中使用GROUP BY:

SELECT T1.FirstKey, T1.SecondKey, T1.ThirdKey, T2.FirstKey, T2.SecondKey, T2.ThirdKey, T1.AdditionalColumns, T2.AdditionalColumns, COUNT(*)
FROM (
SELECT 
       T1.FirstKey, T1.SecondKey, T1.ThirdKey,
       T2.FirstKey, T2.SecondKey, T2.ThirdKey,
       T1.AdditionalColumns, T2.AdditionalColumns
FROM 
       T1 JOIN T2 ON T1.FirstKey = T2.FirstKey 
          AND T1.SecondKey = T2.SecondKey
          AND T1.SecondKey IS NOT NULL
UNION
SELECT 
       T1.FirstKey, T1.SecondKey, T1.ThirdKey,
       T2.FirstKey, T2.SecondKey, T2.ThirdKey, 
       T1.AdditionalColumns, T2.AdditionalColumns
FROM 
       T1 JOIN T2 ON T1.FirstKey = T2.FirstKey 
          AND T1.ThirdKey = T2.ThirdKey
          AND T1.SecondKey IS NULL
)
GROUP BY T1.FirstKey, T1.SecondKey, T1.ThirdKey, T2.FirstKey, T2.SecondKey, T2.ThirdKey, T1.AdditionalColumns, T2.AdditionalColumns
HAVING COUNT(*) = 1;

答案 1 :(得分:0)

一个建议。

让你的整个选择一个子查询。我们将其命名为SUBQ

然后你这样做:

SELECT *
FROM (SUBQ)
GROUP BY `ThirdKey`
HAVING COUNT(*) = 1