我有一个如下所示的数据集:
DUMP A;
(10000,({(10000),(20000),(50000)},{(10000),(20000),(30000)}))
(20000,({(10000),(20000),(50000)},{(20000)},{(10000),(20000),(30000)}))
(30000,({(30000)},{(10000),(20000),(30000)}))
(40000,({(40000)},{(40000),(50000)}))
(50000,({(40000),(50000)},{(10000),(20000),(50000)}))
DESCRIBE A;
{foo: bytearray, bar_gp: (baz: {(foo: bytearray)})}
我最终希望它看起来像这样:
DUMP A;
(10000,{(10000),(20000),(50000),(30000)})
(20000,{(10000),(20000),(50000),(30000)})
(30000,{(10000),(20000),(30000)})
(40000,{(40000),(50000)})
(50000,{(40000),(50000),(10000),(20000)})
如果我尝试使用:
B = FOREACH A GENERATE $0, FLATTEN($1);
C = FOREACH B {D = FOREACH B GENERATE FLATTEN($1); D= DISTINCT D; GENERATE $0, D; }
但我一直收到错误:
expression is not a project expression: (Name: ScalarExpression) Type: null Uid: null)
如何获得所需的输出?我知道我可以使用UDF来解析它,但我想找到一个内置的解决方案。
答案 0 :(得分:0)
我认为你需要在扁平化之前对BAG做出明确的分析。
B = FOREACH A {
D = DISTINCT $1;
GENERATE $0, FLATTEN(D)}