SQL EXCEPT:如何识别新记录VS更改记录

时间:2014-04-23 11:37:01

标签: mysql sql left-join except

虽然我的帖子类似于this,但我仍然认为这与此不同。

我有2个CSV文件。

File A                                       File B
+-------------------------------------------------------------------+
| Name         | Country                     Name         | Country |
+-------------------------------------------------------------------+
| Ferrari      | Italy                       Jaguar       | British |
| Mercedes     | Germany                     Chevrolet    | America |
| Jaguar       | British                     Bugatti      | Italy   |
| Nissan       | Japan                       Tata         | India   |
| Chevrolet    | USA                         Nissan       | Japan   |
+-------------------------------------------------------------------+

以上内容仅供参考。一般来说,我在两个文件中都有更多的行和列,但它们的结构是相同的。

我被要求有效地对所有列进行行级别比较。因此,我建议使用HSQLDB而不是以编程方式进行 CREATE TEXT TABLESET SOURCE分别包含文件,然后在文件之间执行EXCEPT操作。我做了它的编码,它就像一个魅力。下面是我为实现相同目的而编写的SQL部分。

CREATE TABLE COMPARE_TABLE AS (SELECT SRC.*, 'SRC-TGT' compare_order FROM TABLEA SRC EXCEPT SELECT TGT.*, 'SRC-TGT' compare_order FROM TABLEB TGT) WITH DATA;
INSERT INTO COMPARE_TABLE SELECT TGT.*, 'TGT-SRC' compare_order FROM TABLEB TGT EXCEPT SELECT SRC.*, 'TGT-SRC' compare_order FROM TABLEA SRC;

这给了我一个结果表,如下所示(考虑上面的样本数据):

比较表

+------------+----------+---------------+
|   Name     | Country  | Compare_order |
+------------+----------+---------------+
| Ferrari    | Italy    | SRC-TGT       |
| Mercedes   | Germany  | SRC-TGT       |
| Chevrolet  | USA      | SRC-TGT       |
| Chevrolet  | America  | TGT-SRC       |
| Bugatti    | Italy    | TGT-SRC       |
| Tata       | India    | TGT-SRC       |
+------------+----------+---------------+

从这里开始,我需要确定每一行的原因是什么不匹配?至少,我希望将其归为三类:

  • Source at Source
  • 目标
  • 的新功能
  • 值已更改(如果可能,哪些列?)

最后,我希望我的表格如下所示:

COMPARE_TABLE

+------------+----------+---------------+------------------------+
|   Name     | Country  | Compare_order |     Failure_Reason     |
+------------+----------+---------------+------------------------+
| Ferrari    | Italy    | SRC-TGT       | New at Source          |
| Mercedes   | Germany  | SRC-TGT       | New at Source          |
| Chevrolet  | USA      | SRC-TGT       | Country value mismatch |
| Chevrolet  | America  | TGT-SRC       | Country value mismatch |
| Bugatti    | Italy    | TGT-SRC       | New at Target          |
| Tata       | India    | TGT-SRC       | New at Target          |
+------------+----------+---------------+------------------------+

我该怎么做呢?我们甚至可以在SQL中执行此操作吗?

非常感谢任何帮助。

1 个答案:

答案 0 :(得分:0)

您可以进行基本比较:

select name, country,
       (case when sum(which = 'src') > 0 and sum(which = 'tgt') then 'DROPPED'
             when sum(which = 'src') = 0 and sum(which = 'tgt') then 'NEW'
        end) as OP
from ((select 'src' as which, name, country
       from tableA
      ) union all
      (select 'tgt', name, country
       from tableB
      )
     ) ab
group by name, country;

但这并没有给你逐列比较。这有点困难。我假设Name是唯一的,因此它可以用作密钥。以下是比较,但每个名称产生一行:

select name,
       (case when src.country is null then dest.country
             when tgt.country is null then tgt.country
             when src.country = tgt.country then dest.country
             else (src.country, '-->', tgt.country)
        end) as country,
       (case when src.country is null then 'new'
             when tgt.country is null then 'dropped'
             when src.country = tgt.country then 'same'
             else 'changed'
        end) as country,
from (select name from TableA union select name from TableB
     ) names left outer join
     TableA src
     on names.name = src.name left outer join
     TableB tgt
     on names.name = tgt.name;

当列中的值发生变化时,为每个名称获取多行似乎更加困难,尽管它也是可能的。