我有一个表,该表的字段中有数组列表。我正在这张桌子上横向爆炸以获取元素。但是这样做的话,这些值也会成倍增加。
Table:
Sitedomain Keyword Clicks
msn.com sports,cricket,accessories 100
yahoo.com fashion,accessories 50
执行一次横向爆炸后,我的输出类似
Sitedomain Keyword Clicks
msn.com sports 100
msn.com cricket 100
msn.com accessories 100
yahoo.com fashion 50
yahoo.com accessories 50
如您所见,指标也呈爆炸式增长。无论如何,是否有将数据标准化的方法,以便将指标除以数组中元素的数量?所以输出看起来像
Sitedomain Keyword Clicks
msn.com sports 33.3
msn.com cricket 33.3
msn.com accessories 33.3
yahoo.com fashion 25
yahoo.com accessories 25
答案 0 :(得分:1)
按关键字数组大小划分点击次数:
with your_table as(
select stack(2,
'msn.com', 'sports,cricket,accessories', 100,
'yahoo.com', 'fashion,accessories', 50
) as (Sitedomain,Keyword,Clicks)
)
select Sitedomain,k.Keyword,round(s.Clicks/size(Keyword_aray),1) as Clicks
from
(
select Sitedomain,
split(Keyword,',') Keyword_aray,
Clicks
from your_table
)s lateral view explode(Keyword_aray) k as keyword
;
返回:
msn.com sports 33.3
msn.com cricket 33.3
msn.com accessories 33.3
yahoo.com fashion 25.0
yahoo.com accessories 25.0
我添加了round()
来获得精度,如您的示例所示,如有必要,请将其删除。