sort_array由不同的列排序,Hive

时间:2017-04-14 17:04:54

标签: sorting hadoop hive

我有两个栏目,一个是产品,另一个是他们购买的日期。我可以通过应用sort_array(日期)函数来订购日期,但我希望能够在购买日期之前对sort_array(产品)进行排序。 有没有办法在Hive中做到这一点?

表名是

ClientID    Product    Date
100    Shampoo    2016-01-02
101    Book    2016-02-04
100    Conditioner    2015-12-31
101    Bookmark    2016-07-10
100    Cream    2016-02-12
101    Book2    2016-01-03

然后,每个客户获得一行:

select
clientID,
COLLECT_LIST(Product) as Prod_List,
sort_array(COLLECT_LIST(date)) as Date_Order
from tablename
group by 1;

如:

ClientID    Prod_List    Date_Order
100    ["Shampoo","Conditioner","Cream"]    ["2015-12-31","2016-01-02","2016-02-12"]
101    ["Book","Bookmark","Book2"]    ["2016-01-03","2016-02-04","2016-07-10"]

但我想要的是产品的订单与购买的正确时间顺序相关联。

1 个答案:

答案 0 :(得分:2)

可以仅使用内置函数来实现,但它不是一个漂亮的站点: - )

select      clientid
           ,split(regexp_replace(concat_ws(',',sort_array(collect_list(concat_ws(':',cast(date as string),product)))),'[^:]*:([^,]*(,|$))','$1'),',') as prod_list
           ,sort_array(collect_list(date)) as date_order

from        tablename 

group by    clientid
; 
+----------+-----------------------------------+------------------------------------------+
| clientid |             prod_list             |                date_order                |
+----------+-----------------------------------+------------------------------------------+
|      100 | ["Conditioner","Shampoo","Cream"] | ["2015-12-31","2016-01-02","2016-02-12"] |
|      101 | ["Book2","Book","Bookmark"]       | ["2016-01-03","2016-02-04","2016-07-10"] |
+----------+-----------------------------------+------------------------------------------+