在多个列中的每个列上选择不重复

时间:2018-10-18 23:40:32

标签: sql hadoop impala

我正在寻找有关如何使每一列与众不同并进行相应处理的任何输入。 我有一个表Z,数据如下:

CNUM    col1    col2    col3
A1        No     124    30
A1        No     125    31
A2        Yes    123    21
A2         No    123    22

我需要处理此数据以单行装载到另一张表中,规则是,如果col1到col3具有多个不同的值,则将其作为NULL结转。 结果集应如下所示:

CNUM    col1    col2    col3
A1       No     NULL    NULL
A2       NULL   123     NULL 

我通过下面的sql开始这样做:

Select distinct CNUM,
Case when qty<>1 then null
else col1 END as col1,
Case when qty<>1 then null
else col2 END as col2,
Case when qty<>1 then null
 else col3 END as col3
From Table1
Left join (
  Select distinct CNUM ,qty from 
        (select CNUM, count(*) as qty
         FROM Table 1 
         Group by CNUM
         having count(*) >1
         )p
    )p1
 on p1.CNUM=Table1.CNUM

但是上面的查询给我的结果如下,这不是我所需要的。

CNUM    col1    col2    col3
 A1       NULL   NULL    NULL
 A2       NULL   NULL     NULL 

A1,Col1和A2,Col2应该具有值,并且不能为null。 关于我在做什么错的任何想法。

谢谢

1 个答案:

答案 0 :(得分:0)

我已经重写了整个查询,请尝试以下操作:

select
  q1.cnum,
  case when col1_distincts>1 then null else col1_val end col1,
  case when col2_distincts>1 then null else col2_val end col2,
  case when col3_distincts>1 then null else col3_val end col3
from (
    select cnum
    from Table1
    group by cnum
  ) q1
  join (
    select c1.cnum,
      c1.col1_distincts, c1.col1_val,
      c2.col2_distincts, c2.col2_val,
      c3.col3_distincts, c3.col3_val
    from (
        select cnum, count(*) col1_distincts, max(col1) col1_val
        from (
          select cnum, col1
          from Table1
          group by cnum, col1
        ) group by cnum
      ) c1
      join (
        select cnum, count(*) col2_distincts, max(col2) col2_val
        from (
          select cnum, col2
          from Table1
          group by cnum, col2
        ) group by cnum
      ) c2 on c1.cnum=c2.cnum
      join (
        select cnum, count(*) col3_distincts, max(col3) col3_val
        from (
          select cnum, col3
          from Table1
          group by cnum, col3
        ) group by cnum
      ) c3 on c1.cnum=c3.cnum
  ) q2 on q1.cnum=q2.cnum

第一次尝试:

我在您的p1子查询中进行了一些更改,如下所示:

Select distinct Table1.CNUM,
  Case when qty1<>1 then null
    else col1 END as col1,
  Case when qty2<>1 then null
    else col2 END as col2,
  Case when qty3<>1 then null
    else col3 END as col3
From Table1
  Left join (
     select CNUM, count(distinct col1) as qty1,
       count(distinct col2) as qty2,
       count(distinct col3) as qty3
     FROM Table1 
     Group by CNUM
 ) p1
 on p1.CNUM=Table1.CNUM

我还考虑了每列的不同计数。

您的原始子查询每个CNUM仅检查1条以上的记录。