Column1 Column2 1 1,2,10 2 11,12,13 3 1,2,14 4 20,1,10 5 11,12,13,14
Column1 Column2 1 Purchase 2 Product View 10 Cart Open 11 Checkout 12 Cart Add 13 Cart Remove 14 Cart View 20 Campaign View
Column1 Column2 DESC 1 1,2,10 Purchase, Product View, Cart Open 2 11,12,13 Checkout, Cart Add, Cart Remove 3 1,2,14 Purchase, Product View 4 20,1,10 Campaign View, Purchase, Cart Open 5 11,12,13,14 Checkout, Cart Add, Cart Remove, Cart View
注意:
Table1.column2 [0] == table2.column1然后它会在我们添加新结果表的desc列中显示table2.column2值。
我们可以在此查询中使用join吗?如果是的话,我们怎么能在蜂巢中做到?
请帮助解决此问题。
先谢谢, Anbu k
答案 0 :(得分:0)
<强>查询强>:
add jar /path/to/jars/brickhouse-0.7.1.jar;
create temporary function collect as "brickhouse.udf.collect.CollectUDAF";
select a.col1
, collect(b.col1)
, collect(b.col2)
from (
select col1, exp_col2
from db.tbl1
lateral view explode(col2) exptbl as exp_col2 ) a
join db.tbl2 b
on b.col1=a.exp_col2
group by a.col1
<强>输出强>:
1 [1, 2, 10] ["Purchase","Product View","Cart Open"]
2 [11, 12, 13] ["Checkout","Cart Add","Cart Remove"]
3 [1, 2, 14] ["Purchase","Product View","Cart View"]
4 [1, 10, 20] ["Purchase","Cart Open","Campaign View"]
5 [11, 12 ,13 ,14] ["Checkout","Cart Add","Cart Remove","Cart View"]
请务必使用brickhouse collect而不是内置collect_list()
,因为后者并不(必然)保留订单。