我需要删除行中存在的重复值。 喜欢:
C1 | C2 | C3 | C4 | C5 | C6
----------------------------
1 | 2 | 1 | 2 | 1 | 3
1 | 2 | 1 | 3 | 1 | 4
1 |NULL| 1 |NULL| 1 |NULL
查询的输出应为:
C1 | C2 | C3 | C4 | C5 | C6
----------------------------
1 | 2 | 1 | 3 |NULL|NULL
1 | 2 | 1 | 3 | 1 | 4
1 |NULL|NULL|NULL|NULL|NULL
正如您所看到的,2列的组合应该是唯一的。
第1行中的:
1/2的组合是重复的,所以它被移除,而1/3在c5 / c6中移动到c3 / c4
:
在1 / 2,1 / 3,1 / 4的组合中没有重复,因此结果没有变化
:
所有3种组合都相同,如1 / NULL在所有组合中都存在,因此c3到c6设置为空。
提前致谢
答案 0 :(得分:0)
也许有一种更聪明的方式......但是你可以将它们转换成对,不同(在这种情况下联合就是这样),然后转回来。
with pairs as (
select id, c1 as x, c2 as y from mytable
union
select id, c3, c4 from mytable
union
select id, c5, c6 from mytable
)
select id,
max(decode(rn,1,x)) c1,
max(decode(rn,1,y)) c2,
max(decode(rn,2,x)) c3,
max(decode(rn,2,y)) c4,
max(decode(rn,3,x)) c5,
max(decode(rn,3,y)) c6
from (
select id, x, y, row_number() over (partition by id) rn
from pairs
) as foo
group by id
答案 1 :(得分:0)
这个工作 - 包含在测试中的数据,但可能需要一些时间来理解
提示:取消注释 - debug行下的代码片段,复制脚本直到这些代码片段并将此部分粘贴到SQL提示符中以测试中间结果。
原则是获取行标识符"记住"行;然后垂直旋转 - 不是3列到1列,而是6列到3对列;然后,使用DISTINCT去重复;然后在去掉的中间行的行标识符中获取索引;然后使用该索引再次水平旋转。
像这样:
WITH
input(c1,c2,c3,c4,c5,c6) AS (
SELECT 1, 2,1, 2,1, 3
UNION ALL SELECT 1, 2,1, 3,1, 4
UNION ALL SELECT 1,NULL::INT,1,NULL::INT,1,NULL::INT
)
,
-- need rowid
input_with_rowid AS (
SELECT ROW_NUMBER() OVER() AS rowid, * FROM input
)
,
-- three groupy of 2 columns, so pivot using 3 indexes
idx3(idx) AS (SELECT 1 UNION SELECT 2 UNION SELECT 3)
,
-- pivot vertically, two columns at a time and de-dupe
pivot_pair AS (
SELECT DISTINCT
rowid
, CASE idx
WHEN 1 THEN c1
WHEN 2 THEN c3
WHEN 3 THEN c5
END AS c1
,
CASE idx
WHEN 1 THEN c2
WHEN 2 THEN c4
WHEN 3 THEN c6
END AS c2
FROM input_with_rowid CROSS JOIN idx3
)
-- debug
-- SELECT * FROM pivot_pair ORDER BY rowid;
,
-- add sequence per rowid
pivot_pair_with_seq AS (
SELECT
rowid
, ROW_NUMBER() OVER(PARTITION BY rowid) AS seq
, c1
, c2
FROM pivot_pair
)
-- debug
-- SELECT * FROM pivot_pair_with_seq;
SELECT
rowid
, MAX(CASE seq WHEN 1 THEN c1 END) AS c1
, MAX(CASE seq WHEN 1 THEN c2 END) AS c2
, MAX(CASE seq WHEN 2 THEN c1 END) AS c3
, MAX(CASE seq WHEN 2 THEN c2 END) AS c4
, MAX(CASE seq WHEN 3 THEN c1 END) AS c5
, MAX(CASE seq WHEN 3 THEN c2 END) AS c6
FROM pivot_pair_with_seq
GROUP BY rowid
ORDER BY rowid
;
rowid|c1|c2|c3|c4|c5|c6
1| 1| 2| 1| 3|- |-
2| 1| 2| 1| 3| 1| 4
3| 1|- |- |- |- |-
答案 2 :(得分:0)
将marcothesane的想法与pivot / unpivot运算符结合使用。如果需要对更多输入列进行重复数据删除,则更易于维护。这维持了源数据(列对)的顺序 - 而marcothesane的解决方案可能会重新排序依赖于输入数据的列对。它也比marcothesane慢一点。它仅适用于11R1及以上。
WITH
input(c1,c2,c3,c4,c5,c6) AS (
SELECT 1, 2,1, 2,1, 3 from dual
UNION ALL SELECT 1, 2,1, 3,1, 4 from dual
UNION ALL SELECT 1,NULL ,1,NULL ,1,NULL from dual
)
,
-- need rowid
input_with_rowid AS (
SELECT ROW_NUMBER() OVER (order by 1) AS row_id, input.* FROM input
),
unpivoted_pairs as
(
select row_id, tuple_idx, val1, val2, row_number() over (partition by row_id, val1, val2 order by tuple_idx) as keep_first
from input_with_rowid
UnPivot include nulls(
(val1, val2) --measure
for tuple_idx in ((c1,c2) as 1,
(c3,c4) as 2,
(c5,c6) as 3)
)
)
select row_id,
t1_val1 as c1,
t1_val2 as c2,
t2_val1 as c3,
t2_val2 as c4,
t3_val1 as c5,
t3_val2 as c6
from (
select row_id,
val1, val2, row_number() over (partition by row_id order by tuple_idx) as tuple_order
from unpivoted_pairs
where keep_first = 1
)
pivot (sum(val1) as val1, sum(val2) as val2
for tuple_order in ('1' as t1, '2' as t2, '3' as t3)
)