Question

我有一张包含如下数据的表格，

id  code    data1   data2   country
1   1      A           NULL      IND
1   1      B            B        NZ
1   1                            CA
1   1      C           Z         WI
1   1      D           S         UK
2   2      NULL        NULL       IND
2   2       S          NULL       NZ
2   2       NULL        K         CA
2   2       T           T          WI
2   2       R           K          UK
3   3       NULL        A          WI
3   3       NULL        a          UK

记录将根据国家/地区字段的优先级填充。优先考虑的是IND，NZ，CA，WI，UK

如果在data1中有任何NULL，空白，则data2字段数据将从下一个优先级记录中填充。

所以，我的预期结果是：目标表：

id  code    data1   data2   country
1   1          A       B    IND
2   2          S       K    IND
3   3          NULL    A    WI

任何人都可以帮我查询以获得上述结果集。

为了更好地理解查询，我添加了更多行。

Answer 1

Hive具有select distinct id, code, first_value(data1) over (partition by id, code order by (case when data1 is not null then 1 else 2 end), (case country when 'IND' then 1 when 'NZ' then 2 when 'CA' then 3 when 'WI' then 4 when 'UK' then 5 else 6 end) ) as data1, first_value(data2) over (partition by id, code order by (case when data2 is not null then 1 else 2 end), (case country when 'IND' then 1 when 'NZ' then 2 when 'CA' then 3 when 'WI' then 4 when 'UK' then 5 else 6 end) ) as data2, first_value(country) over (partition by id, code order by (case when data1 is not null then 1 else 2 end), (case country when 'IND' then 1 when 'NZ' then 2 when 'CA' then 3 when 'WI' then 4 when 'UK' then 5 else 6 end) ) as country from t;功能，可用于此目的：

select distinct

我不是(function () { var itemCtx = {}; itemCtx.Templates = {}; itemCtx.Templates.Header = "<div><b>Announcements</b></div><table>"; itemCtx.Templates.Item = ItemOverrideFun; itemCtx.Templates.Footer = "</table>"; itemCtx.BaseViewID = 1; itemCtx.ListTemplateType = 104; SPClientTemplates.TemplateManager.RegisterTemplateOverrides(itemCtx); })(); function ItemOverrideFun(ctx) { var _announcementTitle = ctx.CurrentItem.Title; var _announcementDesc = ctx.CurrentItem.Body; var _announcementID = ctx.CurrentItem.ID; return "<tr><td><p><b>" + _announcementTitle + "</b></p>" + _announcementDesc +"<a href='/Lists/Company%20Announcements/DispForm.aspx?ID=" _announcementID +"'> Read More…</a></td></tr>";窗口函数的忠实粉丝。在这种情况下，它似乎是最简单的解决方案。

Answer 2

用例来获取优先级并在其上使用first_value。

select id, max(code), max(data1), max(data2), max(country)
from (
    select
        id,
        code,
        first_value(data1) over (partition by id 
            order by case when data1 is null or data1 = '' then 1 else 0 end * 10 + priority) data1,
        first_value(data2) over (partition by id 
            order by case when data2 is null or data2 = '' then 1 else 0 end * 10 + priority) data2,
        first_value(country) over (partition by id 
            order by case when country is null or country = '' then 1 else 0 end * 10 + priority) country
    from (
        select
            t.*,
            case country
                when 'IND' then 1
                when 'NZ' then 2
                when 'CA' then 3
                when 'WI' then 4
                when 'UK' then 5
            end priority
        from your_table t
    ) t
) t group by id;

产地：

ID  MAX(CODE)   MAX(DATA1)  MAX(DATA2)  MAX(COUNTRY)
1   1           A           B           IND
2   2           S           K           IND
3   3           NULL        A           WI

编辑：

您也可以使用FIELD函数（在hive，MySQL中提供）来生成@Dudu在以下评论中建议的优先级：

field(country,'IND','NZ','CA','WI','UK')

请参阅：

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

Answer 3

另一种基于STRUCT的MIN的方法。

对于我使用函数field（field(country,'IND','NZ','CA','WI','UK')）的订单由于它丢失了，我已将其添加到文档中。 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

select      id
           ,min (code)                                                                                                           as code
           ,min (case when coalesce(trim(data1),'') <> '' then struct(field(country,'IND','NZ','CA','WI','UK'),data1) end).col2  as data1
           ,min (case when coalesce(trim(data2),'') <> '' then struct(field(country,'IND','NZ','CA','WI','UK'),data2) end).col2  as data2
           ,min (struct(field(country,'IND','NZ','CA','WI','UK'),country)).col2                                                  as country

from        mytable

group by    id

order by    id
;

演示

create table mytable 
(
    id      int
   ,code    int
   ,data1   string
   ,data2   string
   ,country string
);

insert into mytable values

    (1 ,1 ,'A'  ,NULL ,'IND')
   ,(1 ,1 ,'B'  ,'B'  ,'NZ' )
   ,(1 ,1 ,''   ,''   ,'CA' )
   ,(1 ,1 ,'C'  ,'Z'  ,'WI' )
   ,(1 ,1 ,'D'  ,'S'  ,'UK' )
   ,(2 ,2 ,NULL ,NULL ,'IND')
   ,(2 ,2 ,'S'  ,NULL ,'NZ' )
   ,(2 ,2 ,NULL ,'K'  ,'CA' )
   ,(2 ,2 ,'T'  ,'T'  ,'WI' )
   ,(2 ,2 ,'R'  ,'K'  ,'UK' )
   ,(3 ,3 ,NULL ,'A'  ,'WI' )
   ,(3 ,3 ,NULL ,'a'  ,'UK' )
;

select      id
           ,min (code)                                                                                                           as code
           ,min (case when coalesce(trim(data1),'') <> '' then struct(field(country,'IND','NZ','CA','WI','UK'),data1) end).col2  as data1
           ,min (case when coalesce(trim(data2),'') <> '' then struct(field(country,'IND','NZ','CA','WI','UK'),data2) end).col2  as data2
           ,min (struct(field(country,'IND','NZ','CA','WI','UK'),country)).col2                                                  as country

from        mytable

group by    id

order by    id
;

+----+------+-------+-------+---------+
| id | code | data1 | data2 | country |
+----+------+-------+-------+---------+
|  1 |    1 | A     | B     | IND     |
|  2 |    2 | S     | K     | IND     |
|  3 |    3 | NULL  | A     | WI      |
+----+------+-------+-------+---------+

如何根据优先级

3 个答案:

编辑：

演示