更好的设计,可消除联合查询后的重复项

时间:2019-05-09 12:54:36

标签: sql oracle oracle11g union

我正在设计一个UNION查询,以将两个带有客户信息的表合并到oracle 11g数据库中。第一个表 a 是“主要”来源,第二个表 b 是具有新的和重复的条目的附加来源。

不能使用UNION消除 b 中的重复项,实际上是不相等的字段(例如必须选择的自动递增ID)。

a

ID CUSTOMER_NUMBER NAME STREET 1 4711 Dirk Downstreet 4 2 4721 Hans Mainstreet 5

b

ID CUSTOMER_NUMBER NAME STREET 44 4711 Dirk Downstreet 4 <== Duplicate 4 4741 Harry Crossroad 9 <== new

预期结果

ID CUSTOMER_NUMBER NAME STREET DATASOURCE 1 4711 Dirk Downstreet 4 SAP <== from a 2 4721 Hans Mainstreet 5 SAP <== from a 4 4741 Harry Crossroad 9 MANUAL <== from b

我对以下简化测试感到满意:

SELECT CUSTOMER_NUMBER, 
    MAX(ID) KEEP (DENSE_RANK FIRST ORDER BY DATASOURCE DESC) ID,
    MAX(NAME) KEEP (DENSE_RANK FIRST ORDER BY DATASOURCE DESC) NAME,
    MAX(STREET) KEEP (DENSE_RANK FIRST ORDER BY DATASOURCE DESC) STREET,
FROM 
    (SELECT "ID","CUSTOMER_NUMBER","NAME","STREET", 'SAP' as DATASOURCE FROM CUSTOMERS
        UNION ALL
    SELECT "ID","CUSTOMER_NUMBER","NAME","STREET", 'MANUAL' as DATASOURCE FROM CUSTOMERS_MANUAL) united
group by CUSTOMER_NUMBER

但是我必须通过DENSE_RANK FIRST ORDER BY DATASOURCE DESC来选择每个字段,这大约是20个字段...

谁能给我个更好的选择?

1 个答案:

答案 0 :(得分:2)

每行KEEP的替代方法是使用ROW_NUMBER并按唯一键和适当的顺序进行分区,并仅选择数字为1的行。

CUSTOMER_NUMBER作为唯一键的示例,相对于MANUAL更喜欢SAP,并期望ID在每个来源中都是唯一的。:

SELECT * FROM 
(
SELECT 
   "ID","CUSTOMER_NUMBER","NAME","STREET",
   roww_number() over (partition by CUSTOMER_NUMBER order by decode(DATASOURCE,'SAP',2,'MANUAL',1), ID) as RN
FROM 
    (SELECT   "ID","CUSTOMER_NUMBER","NAME","STREET", 'SAP' as DATASOURCE FROM CUSTOMERS
        UNION ALL
     SELECT   "ID","CUSTOMER_NUMBER","NAME","STREET", 'MANUAL' as DATASOURCE FROM CUSTOMERS_MANUAL) united
) WHERE RN = 1

即使个别来源提供重复副本,此方法也可以正常工作。调整顺序列,以便查询保持确定性,即重复查询提供相同的结果(例如,如果NAME列可以在ID中重复,则添加SAP