Hive案例导致重复行

时间:2018-10-02 08:49:20

标签: sql database hive hql

我有一个包含联系电话的表,另一个包含“长度”变量和数字列的参考表。 我需要的是找到前缀名称,其中数字的前缀与参考表中的数字匹配,但是它应该是与最长的前缀匹配的数字。 (我希望这是有道理的)

到目前为止,我已经尝试过:

select a.record_type,a.number,b.prefix,b.prefix_name 
from first_table a , second_table b 
where  a.transaction_date=20180924 and case  
    when b.length=1 then substr(a.number,1,1)=b.prefix  
    when b.length=2 then substr(a.number,1,2)=b.prefix  
    when b.length=3 then substr(a.number,1,3)=b.prefix  
    when b.length=4 then substr(a.number,1,4)=b.prefix  
    when b.length=5 then substr(a.number,1,5)=b.prefix  
    when b.length=6 then substr(a.number,1,6)=b.prefix  
    when b.length=7 then substr(a.number,1,7)=b.prefix  
    when b.length=8 then substr(a.number,1,8)=b.prefix 
    when b.length=9 then substr(a.number,1,9)=b.prefix 
    when b.length=10 then substr(a.number,1,10)=b.prefix 
    when b.length=11 then substr(a.number,1,11)=b.prefix 
    when b.length=12 then substr(a.number,1,12)=b.prefix 
    when b.length=13 then substr(a.number,1,13)=b.prefix 
    when b.length=14 then substr(a.number,1,14)=b.prefix 
end

但是它仍然返回重复的结果,即:如果数字为12345,则它与带有前缀1234和123的引用匹配,而我实际上只想要1234。

有什么办法可以优先处理此案吗?谢谢

两个表中的数据示例: example

我当前的结果和期望的结果:results

2 个答案:

答案 0 :(得分:0)

好吧,我重做了,试试这个:

    WITH FIRST_TABLE (RECORD_TYPE,NUM,TRANSACTION_DATE)AS (
    SELECT 'a',12345, DATE '2018-09-24' FROM DUAL
    ),
    SECOND_TABLE (PREFIX,PREFIX_NAME,LENGTH) AS(
    SELECT 12,'Type A', 2 FROM DUAL union all
    SELECT 1234,'Type B', 4 FROM DUAL 
    )
    select * from (
    SELECT A.RECORD_TYPE,A.NUM,B.PREFIX,B.PREFIX_NAME, MAX(B.PREFIX) OVER (PARTITION BY A.RECORD_TYPE,A.NUM) maxPrefix
    FROM FIRST_TABLE A ,SECOND_TABLE B
    WHERE  A.TRANSACTION_DATE=DATE '2018-09-24' 
    AND A.NUM LIKE (B.PREFIX||'%')
    )
    where PREFIX=maxPrefix;

答案 1 :(得分:0)

您可以使用row_number()

select ap.*
from (select a.record_type, a.number, p.prefix, p.prefix_name,
             row_number() over (partition by  a.record_type, a.number order by p.length desc) as seqnum
      from first_table a join
           second_table p
           on (p.length = 1 and substr(a.number, 1, 1) = p.prefix) and
              (p.length = 2 and substr(a.number, 1, 2) = p.prefix) and
              . . . 
              (p.length = 14 and substr(a.number, 1, 14) = p.prefix)    
      where a.transaction_date = 20180924 
     ) ap
where seqnum = 1;

这可以更简洁地表达为:

select ap.*
from (select a.record_type, a.number, p.prefix, p.prefix_name,
             row_number() over (partition by  a.record_type, a.number order by p.length desc) as seqnum
      from first_table a join
           second_table p
           on substr(a.number, 1, p.length) = p.prefix    
      where a.transaction_date = 20180924 
     ) ap
where seqnum = 1;

另一种方法使用各个join进行比较,并在第一个匹配项处停止:

select a.record_type, a.number,
       coalesce(p14.prefix, p13.prefix, . . . , p1.prefix) as prefix,
       coalesce(p14.prefix_name, p13.prefix_name, . . . , p1.prefix_name) as prefix_name
from first_table a left join
     second_table p14
     on p14.length = 14 and substr(a.number, 1, 14) = p14.prefix left join
     second_table p13
     on p13.length = 13 and substr(a.number, 1, 13) = p13.prefix and p14.prefix is null left join
     second_table p12
     on p12.length = 12 and substr(a.number, 1, 12) = p12.prefix and p13.prefix is null left join
     . . .
     second_table p1
     on p1.length = 1 and substr(a.number, 1, 1) = p1.prefix and p2.prefix is null