规范蜂巢中的横向爆炸

时间:2019-06-10 08:19:39

标签: sql hive hiveql array-explode

我有一个表,该表的字段中有数组列表。我正在这张桌子上横向爆炸以获取元素。但是这样做的话,这些值也会成倍增加。

Table:

Sitedomain      Keyword                             Clicks

msn.com         sports,cricket,accessories           100
yahoo.com       fashion,accessories                   50

执行一次横向爆炸后,我的输出类似

 Sitedomain     Keyword       Clicks

 msn.com        sports        100
 msn.com        cricket       100
 msn.com        accessories   100
 yahoo.com      fashion        50
 yahoo.com      accessories    50

如您所见,指标也呈爆炸式增长。无论如何,是否有将数据标准化的方法,以便将指标除以数组中元素的数量?所以输出看起来像

 Sitedomain     Keyword       Clicks

  msn.com        sports        33.3
  msn.com        cricket       33.3
  msn.com        accessories   33.3
  yahoo.com      fashion        25
  yahoo.com      accessories    25

1 个答案:

答案 0 :(得分:1)

按关键字数组大小划分点击次数:

with your_table as(
select stack(2,
'msn.com',         'sports,cricket,accessories',           100,
'yahoo.com',       'fashion,accessories',                   50
) as (Sitedomain,Keyword,Clicks)
)

select Sitedomain,k.Keyword,round(s.Clicks/size(Keyword_aray),1) as Clicks
from
(
select Sitedomain,
       split(Keyword,',')  Keyword_aray, 
       Clicks
  from your_table
)s  lateral view explode(Keyword_aray) k as keyword
;

返回:

msn.com         sports          33.3
msn.com         cricket         33.3
msn.com         accessories     33.3
yahoo.com       fashion         25.0
yahoo.com       accessories     25.0

我添加了round()来获得精度,如您的示例所示,如有必要,请将其删除。