跨多个表重复

时间:2013-09-03 23:20:34

标签: sql sql-server sql-server-2008 tsql

我需要帮助来规划最佳行动方案,以便从多个表中查找重复项。

DECLARE @Table1 TABLE (ID_T1 int, Col1 varchar(10), C2 varchar(10), C3 varchar(10), C4 varchar(10), Col5 varchar(10), Col6 varchar(10))
DECLARE @Table2 TABLE (ID_T2 int, Col1 varchar(10), C2 varchar(10), C3 varchar(10), C4 varchar(10), Col5 varchar(10), Col6 varchar(10))

INSERT INTO @Table1 (ID_T1, Col1, C2, C3, C4, Col5, Col6)
SELECT 1, 'One', 'Test1', 'Line1', 'Record1', 'OTLR1', 'RLTO1'
UNION ALL
SELECT 2, 'Two', 'Test2', 'Line2', 'Record2', 'OTLR2', 'RLTO2'
UNION ALL
SELECT 3, 'Three', 'Test3', 'Line3', 'Record3', 'OTLR3', 'RLTO3'
UNION ALL
SELECT 4, 'Four', 'Test4', 'Line4', 'Record4', 'OTLR4', 'RLTO4'
UNION ALL
SELECT 5, 'Five', 'Test5', 'Line5', 'Record5', 'OTLR5', 'RLTO5'
UNION ALL
SELECT 6, 'Six', 'Test6', 'Line6', 'Record6', 'OTLR6', 'RLTO6'
UNION ALL
SELECT 7, 'Seven', 'Test6', 'Line6', 'Record6', 'OTLR7', 'RLTO7'
UNION ALL
SELECT 8, 'Eight', 'Test8', 'Line8', 'Record8', 'OTLR8', 'RLTO8'

INSERT INTO @Table2 (ID_T2, Col1, C2, C3, C4, Col5, Col6)
SELECT 10, 'Ten', 'Test1', 'Line1', 'Record1', 'OTLR10', 'RLTO10'
UNION ALL
SELECT 20, 'Twenty', 'Test2', 'Line2', 'Record2', 'OTLR20', 'RLTO20'
UNION ALL
SELECT 30, 'Thirty', 'Test3', 'Line3', 'Record3', 'OTLR30', 'RLTO30'
UNION ALL
SELECT 40, 'Forty', 'Test4', 'Line4', 'Record4', 'OTLR40', 'RLTO40'
UNION ALL
SELECT 50, 'Fifty', 'Test5', 'Line5', 'Record5', 'OTLR50', 'RLTO50'
UNION ALL
SELECT 80, 'Eighty', 'Test80', 'Line80', 'Record80', 'OTLR80', 'RLTO80'
UNION ALL
SELECT 90, 'Ninety', 'Test90', 'Line90', 'Record90', 'OTLR90', 'RLTO90'

SELECT * FROM @Table1
SELECT * FROM @Table2

现在,C2,C3和C4可以在表1和表2中具有唯一值或重复值。

我想获得三个输出。 输出1将只包含表1中的记录,表2中的C2,C3和C4列具有相同的值,Duplicate_SameTable中的Duplicate标记为1/0

输出2将只包含表1中的记录,表2中的C2,C3和C4列具有相同的值,Duplicate_PrimaryTable中的重复标记为1/0

输出3将包含表1和Tabl2 2中的记录,这些记录具有相同的C2,C3和C4列的值,Duplicate_BothTables中的Duplicate标记为1/0。

我可以从以下查询中获取输出1.

SELECT *, CASE
            WHEN COUNT(*) OVER (PARTITION BY  C2, C3, C4) > 1 THEN 1
            ELSE 0
         END AS Duplicate_SameTable
FROM @Table1
ORDER BY ID_T1 ASC

输出2

SELECT B.ID, B.Col1, B.C2, B.C3, B.C4, B.Col5, B.Col6, CASE WHEN C.Duplicate_SameTable = 1 THEN 0 ELSE B.Duplicate_BothTables END AS Duplicate_PrimaryTable
FROM (
SELECT ID, Col1, C2, C3, C4, Col5, Col6, CASE
            WHEN COUNT(*) OVER (PARTITION BY  C2, C3, C4) > 1 THEN 1
            ELSE 0
         END AS Duplicate_BothTables FROM ( 
SELECT ID_T1 AS ID, Col1, C2, C3, C4, Col5, Col6 FROM @Table1
UNION
SELECT ID_T2 AS ID, Col1, C2, C3, C4, Col5, Col6 FROM @Table2) A
) B INNER JOIN (SELECT *, CASE
            WHEN COUNT(*) OVER (PARTITION BY  C2, C3, C4) > 1 THEN 1
            ELSE 0
         END AS Duplicate_SameTable
FROM @Table1) C ON B.ID = C.ID_T1

输出3

SELECT B.ID, B.Col1, B.C2, B.C3, B.C4, B.Col5, B.Col6, CASE WHEN C.Duplicate_SameTable = 1 THEN 0 ELSE B.Duplicate_BothTables END AS Duplicate_PrimaryTable
FROM (
SELECT ID, Col1, C2, C3, C4, Col5, Col6, CASE
            WHEN COUNT(*) OVER (PARTITION BY  C2, C3, C4) > 1 THEN 1
            ELSE 0
         END AS Duplicate_BothTables FROM ( 
SELECT ID_T1 AS ID, Col1, C2, C3, C4, Col5, Col6 FROM @Table1
UNION
SELECT ID_T2 AS ID, Col1, C2, C3, C4, Col5, Col6 FROM @Table2) A
) B LEFT JOIN (SELECT *, CASE
            WHEN COUNT(*) OVER (PARTITION BY  C2, C3, C4) > 1 THEN 1
            ELSE 0
         END AS Duplicate_SameTable
FROM @Table1) C ON B.ID = C.ID_T1
ORDER BY B.ID

我想知道如何获得输出2和输出3.

我能想到的一种方法是联盟所有表1和表2,然后运行上面的查询。或者是否有更好的方法来实现这一点,因为真正的表将拥有数百万条记录并执行UNION ALL,然后应用上述查询可能需要更长的时间。

谢谢

编辑:我的尝试更新了这篇文章。看起来太乱了,不确定这是否是明智的行动表现。

1 个答案:

答案 0 :(得分:0)

你能不能只是继续加入桌子。即

SELECT t1.*, CASE WHEN t2.ID IS NULL THEN 0 ELSE 1 END as Duplicate 
FROM Table1 t1 LEFT JOIN Table2 t2 ON t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3