我正在设计一个UNION查询,以将两个带有客户信息的表合并到oracle 11g数据库中。第一个表 a 是“主要”来源,第二个表 b 是具有新的和重复的条目的附加来源。
不能使用UNION消除 b 中的重复项,实际上是不相等的字段(例如必须选择的自动递增ID)。
表 a
ID CUSTOMER_NUMBER NAME STREET
1 4711 Dirk Downstreet 4
2 4721 Hans Mainstreet 5
表 b
ID CUSTOMER_NUMBER NAME STREET
44 4711 Dirk Downstreet 4 <== Duplicate
4 4741 Harry Crossroad 9 <== new
预期结果
ID CUSTOMER_NUMBER NAME STREET DATASOURCE
1 4711 Dirk Downstreet 4 SAP <== from a
2 4721 Hans Mainstreet 5 SAP <== from a
4 4741 Harry Crossroad 9 MANUAL <== from b
我对以下简化测试感到满意:
SELECT CUSTOMER_NUMBER,
MAX(ID) KEEP (DENSE_RANK FIRST ORDER BY DATASOURCE DESC) ID,
MAX(NAME) KEEP (DENSE_RANK FIRST ORDER BY DATASOURCE DESC) NAME,
MAX(STREET) KEEP (DENSE_RANK FIRST ORDER BY DATASOURCE DESC) STREET,
FROM
(SELECT "ID","CUSTOMER_NUMBER","NAME","STREET", 'SAP' as DATASOURCE FROM CUSTOMERS
UNION ALL
SELECT "ID","CUSTOMER_NUMBER","NAME","STREET", 'MANUAL' as DATASOURCE FROM CUSTOMERS_MANUAL) united
group by CUSTOMER_NUMBER
但是我必须通过DENSE_RANK FIRST ORDER BY DATASOURCE DESC来选择每个字段,这大约是20个字段...
谁能给我个更好的选择?
答案 0 :(得分:2)
每行KEEP
的替代方法是使用ROW_NUMBER
并按唯一键和适当的顺序进行分区,并仅选择数字为1的行。
将CUSTOMER_NUMBER
作为唯一键的示例,相对于MANUAL
更喜欢SAP
,并期望ID
在每个来源中都是唯一的。:
SELECT * FROM
(
SELECT
"ID","CUSTOMER_NUMBER","NAME","STREET",
roww_number() over (partition by CUSTOMER_NUMBER order by decode(DATASOURCE,'SAP',2,'MANUAL',1), ID) as RN
FROM
(SELECT "ID","CUSTOMER_NUMBER","NAME","STREET", 'SAP' as DATASOURCE FROM CUSTOMERS
UNION ALL
SELECT "ID","CUSTOMER_NUMBER","NAME","STREET", 'MANUAL' as DATASOURCE FROM CUSTOMERS_MANUAL) united
) WHERE RN = 1
即使个别来源提供重复副本,此方法也可以正常工作。调整顺序列,以便查询保持确定性,即重复查询提供相同的结果(例如,如果NAME
列可以在ID
中重复,则添加SAP
)