Question

我的数据包含location，sentiment和brand字段。我想计算一个品牌位置的正数，负数和中性数。

假设x有数据，我做了：

a1 = GROUP x BY (location, brand);
a2 = FOREACH a1 GENERATE FLATTEN(group) AS (location, brand), COUNT(x.sentiment=="positive"?1:0) AS positive_count, COUNT(x.sentiment=="negative"?1:0) AS negative_count, COUNT(x.sentiment=="neutral:?1:0) as neutral_count;

但我收到语法错误Unexpected character '"'

我尝试了所有三个分组：location, sentiment and brand但我只得到总体计数：

{location: "newyork", brand: "pampers", sentiment = "positive", count = 10}
{location: "newyork", brand: "pampers", sentiment = "negative", count = 2}
{location: "newyork", brand: "pampers", sentiment = "neutral", count = 20}

我想要positives_count，negatives_count和neutrals_count的单独字段。像这样：

{location: "newyork", brand: "pampers", positive_count = 10, negative_count = 2, neutral_count = 20}
{location: "london", brand: "pampers", positive_count = 12, negative_count = 0, neutral_count = 35}
{location: "newyork", brand: "huggies", positive_count = 40, negative_count = 6, neutral_count = 10}

有人可以帮帮我吗？

Answer 1

使用单引号

a1 = GROUP x BY (location, brand);
a2 = FOREACH a1 GENERATE FLATTEN(group) AS (location, brand), 
                    COUNT(x.sentiment=='positive'?1:0) AS positive_count, 
                    COUNT(x.sentiment=='negative'?1:0) AS negative_count, 
                    COUNT(x.sentiment=='neutral'?1:0) as neutral_count;

修改

newyork pampers positive newyork pampers positive newyork pampers negative newyork pampers positive newyork pampers positive newyork pampers neutral newyork pampers positive newyork pampers negative newyork pampers neutral newyork pampers positive newyork pampers positive newyork pampers neutral

<强>脚本

B = GROUP A BY (location,brand); C = FOREACH B { A1 = FILTER A BY sentiment matches 'positive'; A2 = FILTER A BY sentiment matches 'negative'; A3 = FILTER A BY sentiment matches 'neutral'; GENERATE FLATTEN(group) as (location,brand),COUNT(A1),COUNT(A2),COUNT(A3); };

<强>输出

Answer 2

我过滤了包含原始数据的别名，并计算了每个条目数并将它们全部加入。

j = pow(p1, k1)

有点冗长但有效。

猪：只计算特定行

2 个答案: