我有这样一个包(url:chararray mal:float)并且喜欢这个(url:chararray链接:chararray)。 我想解析链接字段并将包与解析链接相交:
src = LOAD 'hbase://$collection' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:url anchors:links', '-loadKey true') AS (id:bytearray, url:chararray, links:chararray);
mals = LOAD '/tmp/prepare' as (url:chararray, mal:float);
urls = FILTER src BY (links IS NOT null);
urls2 = FOREACH urls GENERATE TOKENIZE(links, '\t') as links, id, url;
processed = FOREACH urls2 {
grouped = COGROUP links BY $0, mals BY url;
intersected = FILTER grouped BY NOT IsEmpty(urls) AND NOT IsEmpty(links4);
weights = FOREACH intersected GENERATE mal;
GENERATE id, AVG(weights) as mal;
};
此代码无效:解析器失败并显示:
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <file ./Rank.pig, line 11, column 19> [query, statement, foreach_statement, foreach_complex_statement, foreach_clause_complex, foreach_plan_complex, nested_blk, nested_command_list, nested_command, expr, add_expr, multi_expr, cast_expr, unary_expr, expr_eval, var_expr, projectable_expr, func_eval, recoverFromMismatchedToken] mismatched input 'links' expecting LEFT_PAREN
我使用Pig 0.11.0。
据我所知,链接是元组,而mals是包,所以它们不能被合并。如何创建一个带有指向cogroup的链接的包?
UPD: 示例数据集:
/tmp/prepare:
http://1 1.0
http://2 0.9
http://3 0.8
http://4 0.0
HBase:
id: ID
url: http://4
links: http://1 http://2 http://3
作为输出:
{(id: ID, mal: 0.9)}