我尝试了以下内容 -
C = FOREACH A GENERATE dataMap#'$key_name' AS C_ID, dataMap#'$key_name2' AS methodName, pool,time;
D = GROUP C BY (C_ID);
E = FOREACH D {
sorted = order C by time desc;
GENERATE group,C.methodName AS Flow ;
};
F = GROUP E BY (Flow);
G = FOREACH F {
GENERATE group,COUNT(E) AS FlowKount ;
};
STORE G INTO '$output' USING PigStorage();
但是我收到错误 - 使用包作为不支持的密钥
上述程序中与E对应的数据 -
c1 {(m1), (m2), (m3) }
c2 {(m1), (m2), (m3) }
c3 {(m2), (m1), (m3) }
c4 {(m1), (m2), (m3) }
c5 {(m2), (m1), (m3) }
我需要输出为 -
{(m1), (m2), (m3) } {(c1),(c2),(c4)} 3
{(m2), (m1), (m3) } {(c3),(c5)} 2
是 - methods,C_Ids和count
检查包含不同C_ID
的包中具有相同方法的重复流的类型有人可以指导如何实现这个目标吗?