删除行中的重复(2列组合)值

时间:2017-02-21 13:42:05

标签: sql oracle vertica

我需要删除行中存在的重复值。 喜欢:

C1 | C2 | C3 | C4 | C5 | C6
----------------------------
1  | 2  |  1 | 2  | 1  | 3
1  | 2  |  1 | 3  | 1  | 4
1  |NULL|  1 |NULL| 1  |NULL

查询的输出应为:

C1 | C2 | C3 | C4 | C5 | C6
----------------------------
1  | 2  |  1 | 3  |NULL|NULL
1  | 2  |  1 | 3  | 1  | 4
1  |NULL|NULL|NULL|NULL|NULL

正如您所看到的,2列的组合应该是唯一的。

第1行中的


1/2的组合是重复的,所以它被移除,而1/3在c5 / c6中移动到c3 / c4

第2行中的


在1 / 2,1 / 3,1 / 4的组合中没有重复,因此结果没有变化

第3行中的


所有3种组合都相同,如1 / NULL在所有组合中都存在,因此c3到c6设置为空。

提前致谢

3 个答案:

答案 0 :(得分:0)

也许有一种更聪明的方式......但是你可以将它们转换成对,不同(在这种情况下联合就是这样),然后转回来。

with pairs as (
    select id, c1 as x, c2 as y from mytable
    union
    select id, c3, c4 from mytable
    union 
    select id, c5, c6 from mytable
)
select id, 
       max(decode(rn,1,x)) c1,
       max(decode(rn,1,y)) c2,
       max(decode(rn,2,x)) c3,
       max(decode(rn,2,y)) c4,
       max(decode(rn,3,x)) c5,
       max(decode(rn,3,y)) c6
from (
    select id, x, y, row_number() over (partition by id) rn
    from pairs
) as foo
group by id

答案 1 :(得分:0)

这个工作 - 包含在测试中的数据,但可能需要一些时间来理解

提示:取消注释 - debug行下的代码片段,复制脚本直到这些代码片段并将此部分粘贴到SQL提示符中以测试中间结果。

原则是获取行标识符"记住"行;然后垂直旋转 - 不是3列到1列,而是6列到3对列;然后,使用DISTINCT去重复;然后在去掉的中间行的行标识符中获取索引;然后使用该索引再次水平旋转。

像这样:

WITH
input(c1,c2,c3,c4,c5,c6) AS (
          SELECT 1,        2,1,        2,1,        3
UNION ALL SELECT 1,        2,1,        3,1,        4
UNION ALL SELECT 1,NULL::INT,1,NULL::INT,1,NULL::INT
)
,
-- need rowid
input_with_rowid AS (
SELECT ROW_NUMBER() OVER() AS rowid, * FROM input
)
,
-- three groupy of 2 columns, so pivot using 3 indexes
idx3(idx) AS (SELECT 1 UNION SELECT 2 UNION SELECT 3)
,
-- pivot vertically, two columns at a time and de-dupe
pivot_pair AS (
SELECT DISTINCT
  rowid
, CASE idx 
    WHEN 1 THEN c1
    WHEN 2 THEN c3
    WHEN 3 THEN c5
  END AS c1
, 
  CASE idx 
    WHEN 1 THEN c2
    WHEN 2 THEN c4
    WHEN 3 THEN c6
  END AS c2
FROM input_with_rowid CROSS JOIN idx3
)
-- debug
-- SELECT * FROM pivot_pair ORDER BY rowid;
,
-- add sequence per rowid
pivot_pair_with_seq AS (
SELECT
  rowid
, ROW_NUMBER() OVER(PARTITION BY rowid) AS seq
, c1
, c2
FROM pivot_pair
)
-- debug
-- SELECT * FROM pivot_pair_with_seq;

SELECT
  rowid
, MAX(CASE seq WHEN 1 THEN c1 END) AS c1
, MAX(CASE seq WHEN 1 THEN c2 END) AS c2
, MAX(CASE seq WHEN 2 THEN c1 END) AS c3
, MAX(CASE seq WHEN 2 THEN c2 END) AS c4
, MAX(CASE seq WHEN 3 THEN c1 END) AS c5
, MAX(CASE seq WHEN 3 THEN c2 END) AS c6
FROM pivot_pair_with_seq
GROUP BY rowid
ORDER BY rowid
;

rowid|c1|c2|c3|c4|c5|c6
    1| 1| 2| 1| 3|- |-
    2| 1| 2| 1| 3| 1| 4
    3| 1|- |- |- |- |-

答案 2 :(得分:0)

将marcothesane的想法与pivot / unpivot运算符结合使用。如果需要对更多输入列进行重复数据删除,则更易于维护。这维持了源数据(列对)的顺序 - 而marcothesane的解决方案可能会重新排序依赖于输入数据的列对。它也比marcothesane慢一点。它仅适用于11R1及以上。

WITH
input(c1,c2,c3,c4,c5,c6) AS (
          SELECT 1,        2,1,        2,1,        3 from dual
UNION ALL SELECT 1,        2,1,        3,1,        4 from dual
UNION ALL SELECT 1,NULL ,1,NULL ,1,NULL   from dual
)
,
-- need rowid
input_with_rowid AS (
SELECT ROW_NUMBER() OVER (order by 1) AS row_id, input.* FROM input
),
unpivoted_pairs as
(
  select row_id, tuple_idx, val1, val2, row_number() over (partition by row_id, val1, val2 order by tuple_idx) as keep_first
  from input_with_rowid
  UnPivot include nulls(
          (val1, val2)  --measure 
                for tuple_idx in ((c1,c2) as 1,
                                  (c3,c4) as 2,
                                  (c5,c6) as 3)
          )
)
select row_id, 
       t1_val1 as c1,
       t1_val2 as c2,
       t2_val1 as c3,
       t2_val2 as c4,
       t3_val1 as c5,
       t3_val2 as c6
from (
      select row_id,  
             val1, val2, row_number() over (partition by row_id order by tuple_idx) as tuple_order
      from unpivoted_pairs
      where keep_first = 1
      )
pivot (sum(val1) as val1, sum(val2) as val2
       for tuple_order in ('1' as t1, '2' as t2, '3' as t3)
       )