SQL Server:FULL JOIN选择,管理表A和表之间的重复项。乙

时间:2015-02-11 21:18:50

标签: sql-server

我在查询中有两个表的复杂(对我)SQL Server连接。我有两个表执行相同的功能与相似的结构(但单独更新),并写一个查询,以找到两者之间的差异,以便他们可以更好地保持相对同步。表格类似于以下内容:

修改:请在底部添加SQL Fiddle链接以增加清晰度。

Table A
+-----+---------+----------------+------------+-----------+
| ID  | Product | ValidationDate | ValidValue | ValidHour |
+-----+---------+----------------+------------+-----------+
| 991 |       1 | 1/1/2015       |        100 | 8:00      |
| 992 |       1 | 1/2/2015       |        100 | 8:00      |
| 993 |       1 | 1/2/2015       |        100 | 8:00      |
| 994 |       1 | 1/3/2015       |         88 | 9:00      |
| 995 |       2 | 1/1/2015       |         98 | 9:00      |
| 996 |       2 | 1/1/2015       |         75 | 10:00     |
| 997 |       2 | 1/5/2015       |         80 | 10:00     |
+-----+---------+----------------+------------+-----------+

Table B
+----+---------+----------------+------------+-----------+
| ID | Product | ValidationDate | ValidValue | ValidHour |
+----+---------+----------------+------------+-----------+
|  1 |       1 | 1/1/2015       |        100 | 8:00      |
|  2 |       1 | 1/2/2015       |        100 | 8:00      |
|  3 |       1 | 1/3/2015       |         88 | 9:00      |
|  4 |       2 | 1/1/2015       |         98 | 8:00      |
|  5 |       2 | 1/1/2015       |         75 | 10:00     |
+----+---------+----------------+------------+-----------+

该查询旨在通过ProductValidationDateValidValueValidHour返回两个表之间的差异。如果两者之间存在不匹配,我希望显示特定ProductValidationDateValidValueValidHour的两个表格的结果,如下所示:

+---------+-----------+---------+---------+---------+--------+
| Product | ValidDate | A_Value | A_Grade | B_Value | B_Hour |
+---------+-----------+---------+---------+---------+--------+
|       1 | 1/2/2015  |     100 | 8:00    |         |        |
|       1 | 1/2/2015  |     100 | 8:00    |         |        |
|       1 | 1/2/2015  |         |         |     100 | 8:00   |
|       2 | 1/5/2015  |      80 | 10:00   |         |        |
+---------+-----------+---------+---------+---------+--------+

如您所见,表A(ID 992,993)和表B(ID 2)之间存在不均匀的相同条目,因此表A条目和单个表B条目都是输出。此外,表A包含表B中未找到的条目,也会显示。

我最初创建了一个类似于以下内容的声明:

SELECT Product, ValidDate, a.Value, a.Grade, b.Value, b.Hour
FROM Table_A a
FULL OUTER JOIN Table_B b
  ON a.Product = b.Product AND
     a.ValidDate = b.ValidDate AND
     a.ValidValue = b.ValidValue AND
     a.ValidHour = b.ValidHour AND
WHERE (a.Product IS NULL AND
     a.ValidDate IS NULL AND
     a.ValidValue IS NULL AND
     a.ValidHour IS NULL) 
    OR
    (b.Product IS NULL AND
     b.ValidDate IS NULL AND
     b.ValidValue IS NULL AND
     b.ValidHour IS NULL)

然而,它并没有真正起作用,因为即使它在上面所需的输出中返回第三列的正确结果(其中一行存在于一个表中但在另一个表中根本不存在),它忽略了表之间存在不均匀的重复条目。

如何在查询中完成我需要的内容?

编辑:我在SQL Fiddle中创建了一个更清晰的情况和愿望示例:http://sqlfiddle.com/#!3/f8a92/4我对此解释进行了评论。

3 个答案:

答案 0 :(得分:0)

我使用了2个表表达式,并在除聚合列之外的所有列上均等地加入了它们。如果聚合列存在差异,则查询返回该行。希望这会有所帮助。

    With A as 
(
    select Date, Product, ValidValue, SomeValue2, 
        count(*) as NumOfOccurences
    from dbo.TableA
    group by Date, Product, ValidValue, SomeValue2
)
, 
b as
(
    select Date, Product, ValidValue, SomeValue2, 
            count(*) as NumOfOccurences
    from dbo.TableB
    group by Date, Product, ValidValue, SomeValue2

)

select *
from a join b on A.date = b.date and
                      A.product = b.product and
                      A.ValidValue = b.ValidValue and
                      A.SomeValue2 = b.SomeValue2 and
                      A.NumOfOccurences <> b.NumOfOccurences;

答案 1 :(得分:0)

你要求的是,尽管可能并非不可能,但是代码方面会非常复杂。所有DBMS都按照集合进行操作,并且集合的最重要特征之一是不允许重复。当您允许多个条目具有相同的非键值时,您将不再使用一组。

到目前为止,最简单的解决方案是截断TableB并只复制TableA中的所有内容。

truncate TableB;
insert into TableB( ... )
   select ... from TableA;

我假设两个表的ID字段都是自动生成的。我还假设这些表有一些东西阻止触发器在运行中保持同步。这将是最佳的长期解决方案。

答案 2 :(得分:0)

行。感谢sqlfiddle。这使得处理起来相当简单。为了完整起见,我打算在这里粘贴你的小提琴。

CREATE TABLE [dbo].[TableA](
    [ID] [int] NOT NULL,
    [Product] [int] NOT NULL,
    [Date] [datetime] NOT NULL,
    [ValidValue] [int] NOT NULL,
    [SomeValue2] [int] NOT NULL
) ON [PRIMARY]

CREATE TABLE [dbo].[TableB](
    [ID] [bigint] NOT NULL,
    [Product] [int] NOT NULL,
    [Date] [datetime] NOT NULL,
    [ValidValue] [smallint] NOT NULL,
    [SomeValue2] [smallint] NOT NULL
) ON [PRIMARY]

INSERT INTO TableA SELECT 991,1,'1/1/15',75, 7;
INSERT INTO TableA SELECT 992,1,'1/2/15',100,8; --these two are identical values in TableA,
INSERT INTO TableA SELECT 993,1,'1/2/15',100,8; --Possibly erroneous duplication 

INSERT INTO TableB SELECT 1,  1,'1/2/15',100,8; --In TableB, there is only a single matching value
INSERT INTO TableB SELECT 1,  1,'1/3/15',90, 6;

现在,为了能够识别重复项,您还需要使用ROW_NUMBER。我们可以通过为每个基表使用cte来实现这一点。然后稍微修改连接,它将为EITHER TableA或TableB

提供不匹配的重复项
with MyTableA as
(
    select *
        , ROW_NUMBER() over(partition by Product, Date order by ID) as RowNum
    from TableA a
)
, MyTableB as
(
    select * 
    , ROW_NUMBER() over(partition by Product, Date order by ID) as RowNum
    from TableB
)

SELECT  CASE WHEN a.Date IS NULL 
             THEN b.Date
             ELSE a.Date
             END Date,
        CASE WHEN a.Product IS NULL 
             THEN b.Product
             ELSE a.Product
             END Product,
        a.ValidValue A_ValidValue, 
        a.SomeValue2 A_SomeValue2, 
        b.ValidValue B_ValidValue, 
        b.SomeValue2 B_SomeValue2
FROM MyTableA a
FULL OUTER JOIN MyTableB b
  ON a.Product = b.Product AND
     a.Date = b.Date AND
     a.ValidValue = b.ValidValue AND
     a.SomeValue2 = b.SomeValue2
     AND a.RowNum = b.RowNum
WHERE (a.Product IS NULL OR
     a.Date IS NULL OR
     a.ValidValue IS NULL OR
     a.SomeValue2 IS NULL) 
    OR
    (b.Product IS NULL OR
     b.Date IS NULL OR
     b.ValidValue IS NULL OR
     b.SomeValue2 IS NULL)

我删除了之前的回答,并且注释中包含了sqlfiddle链接。这是原始sqlfiddle的链接。 http://sqlfiddle.com/#!3/f8a92/11