比较两个没有唯一键

时间:2016-11-25 12:53:11

标签: sql oracle

我需要比较两个表数据并检查哪些属性不匹配,表有相同的表定义,但问题是我有一个唯一的比较密钥。我试着用

CONCAT(CONCAT(CONCAT(table1.A, Table1.B))
=CONCAT(CONCAT(CONCAT(table2.A, Table2.B))

但仍然面临重复的行也在几列上尝试了NVL,但没有工作

SELECT  
    UT.cat,
    PD.cat
FROM 
    EM UT, EM_63 PD 
WHERE 
    NVL(UT.cat, 1) = NVL(PD.cat, 1) AND
    NVL(UT.AT_NUMBER, 1) = NVL(PD.AT_NUMBER, 1) AND
    NVL(UT.OFFSET, 1) = NVL(PD.OFFSET, 1) AND  
    NVL(UT.PROD, 1) = NVL(PD.PROD, 1)
;

另一个表中35k记录中有34k记录,但如果我运行上述查询,则行数为3百万。

表格中的列:

COUNTRY       
CATEGORY   
TYPE    
DESCRIPTION

示例数据:

表1:

COUNTRY  CATEGORY TYPE   DESCRIPTION       
US          C       T1      In
IN          A       T2      OUT
B           C       T2      IN
Y           C       T1      INOUT

表2:

COUNTRY  CATEGORY TYPE   DESCRIPTION    
US          C       T2      In
IN          B        T2     Out
Q           C       T2      IN

预期产出:

column      Matched  unmatched
COUNTRY         2       1
CATEGORY        2       1
TYPE            2       1
DESCRIPTION     3       0

3 个答案:

答案 0 :(得分:2)

在最常见的情况下(当您可能有重复的行,并且您想要查看哪个行存在于一个表中而不存在于另一个表中时,以及哪些行可能存在于两个表中,但该行存在3次第一个表但另一个表5次:

这是一个非常普遍的问题,有一个稳定的最佳解决方案"由于某些原因,似乎大多数人仍然没有意识到,即使它是多年前在AskTom上开发的并且已经多次出现过。

您不需要加入,也不需要任何类型的唯一密钥,并且您不需要多次读取任何一个表。我们的想法是添加两列来显示每一行来自哪个表,执行UNION ALL,然后GROUP BY所有列除了" source"列并显示每个表的计数。像这样:

select   count(t_1) as count_table_1, count(t_2) as count_table_2, col1, col2, ...
from     (
           select 'x' as t_1, null as t_2, col1, col2, ... 
             from table_1
           union all
           select null as t_1, 'x' as t_2, col1, col2, ...
             from table_2
         )
group by col1, col2, ...
having   count(t_1) != count(t_2)
;

答案 1 :(得分:1)

从此查询开始,检查这4列是否构成密钥。

select      occ_total,occ_ut,occ_pd
           ,count(*)                as records

from       (select      count (*)                               as occ_total
                       ,count (case tab when 'UT' then 1 end)   as occ_ut
                       ,count (case tab when 'PD' then 1 end)   as occ_pd

            from                    select 'UT' as tab,cat,AT_NUMBER,OFFSET,PROD from EM
                        union all   select 'PD'       ,cat,AT_NUMBER,OFFSET,PROD from EM_63 PD
                        ) t

            group by    cat,AT_NUMBER,OFFSET,PROD
            ) t

group by    occ_total,occ_ut,occ_pd     

order by    records desc
;

选择"键"后,您可以使用以下查询来查看属性'值

select      count (*)                               as occ_total
           ,count (case tab when 'UT' then 1 end)   as occ_ut
           ,count (case tab when 'PD' then 1 end)   as occ_pd

           ,count (distinct att1)                   as cnt_dst_att1
           ,count (distinct att2)                   as cnt_dst_att2
           ,count (distinct att3)                   as cnt_dst_att3
           ,...
           ,listagg (case tab when 'UT' then att1 end) within group (order by att1) as att1_vals_ut
           ,listagg (case tab when 'PD' then att1 end) within group (order by att1) as att1_vals_pd
           ,listagg (case tab when 'UT' then att2 end) within group (order by att2) as att2_vals_ut
           ,listagg (case tab when 'PD' then att2 end) within group (order by att2) as att2_vals_pd
           ,listagg (case tab when 'UT' then att3 end) within group (order by att3) as att3_vals_ut
           ,listagg (case tab when 'PD' then att3 end) within group (order by att3) as att3_vals_pd  
           ,...

from                    select 'UT' as tab,cat,AT_NUMBER,OFFSET,PROD,att1,att2,att3,... from E M
            union all   select 'PD'       ,cat,AT_NUMBER,OFFSET,PROD,att1,att2,att3,... from EM_63 PD
            ) t

group by    cat,AT_NUMBER,OFFSET,PROD
;

答案 2 :(得分:0)

CONCAT的问题是,如果您的数据与此类似,则可能会收到无效匹配:

table1.A = '123'
table1.B = '456'

连接到:'123456'

table2.A = '12'
table2.B = '3456'

也加入:'123456'

您必须单独比较字段:table1.A = table2.A AND table1.B = table2.B