如何将矢量累积到地图中?

时间:2014-04-08 18:23:10

标签: apache-pig

我有这样的别名A

{cookie: chararray,
 keywords: {tuple_of_tokens: (token: chararray)},
 weight: double}

其中第2和第3个字段定义为

keywords = TOKENIZE((chararray)$5,',');
weight = 1.0/(double)SIZE(keywords);

现在我想做

foreach (group A by cookie) generate
  group.cookie as cookie,
  ???? as keywords;

keywords应该是关键字中的map到权重之和。

,例如,

1   k1,k2,k3
1   k2,k4

应该变成

1   {k1:1/3, k2:5/6, k3:1/3, k4:1/2}

我已经在使用datafu,但我愿意接受任何替代方案......

1 个答案:

答案 0 :(得分:0)

我做

A_counts = foreach A generate cookie,flatten(keywords) as keyword,1.0/SIZE(keywords) as weight;

然后

A_counts_gr = group A by (cookie,keyword);
result= foreach A_counts_gr generate flatten(group) as (cookie,token), sum(A_counts_gr.weight);

然后一个人可以通过cookie分组来获得你想要的包......再次通过cookie分组后会有一个包,而不是你可以将这个包变成带有datafu的地图......