我有两个栏目,一个是产品,另一个是他们购买的日期。我可以通过应用sort_array(日期)函数来订购日期,但我希望能够在购买日期之前对sort_array(产品)进行排序。 有没有办法在Hive中做到这一点?
表名是
ClientID Product Date
100 Shampoo 2016-01-02
101 Book 2016-02-04
100 Conditioner 2015-12-31
101 Bookmark 2016-07-10
100 Cream 2016-02-12
101 Book2 2016-01-03
然后,每个客户获得一行:
select
clientID,
COLLECT_LIST(Product) as Prod_List,
sort_array(COLLECT_LIST(date)) as Date_Order
from tablename
group by 1;
如:
ClientID Prod_List Date_Order
100 ["Shampoo","Conditioner","Cream"] ["2015-12-31","2016-01-02","2016-02-12"]
101 ["Book","Bookmark","Book2"] ["2016-01-03","2016-02-04","2016-07-10"]
但我想要的是产品的订单与购买的正确时间顺序相关联。
答案 0 :(得分:2)
可以仅使用内置函数来实现,但它不是一个漂亮的站点: - )
select clientid
,split(regexp_replace(concat_ws(',',sort_array(collect_list(concat_ws(':',cast(date as string),product)))),'[^:]*:([^,]*(,|$))','$1'),',') as prod_list
,sort_array(collect_list(date)) as date_order
from tablename
group by clientid
;
+----------+-----------------------------------+------------------------------------------+
| clientid | prod_list | date_order |
+----------+-----------------------------------+------------------------------------------+
| 100 | ["Conditioner","Shampoo","Cream"] | ["2015-12-31","2016-01-02","2016-02-12"] |
| 101 | ["Book2","Book","Bookmark"] | ["2016-01-03","2016-02-04","2016-07-10"] |
+----------+-----------------------------------+------------------------------------------+