Amazon EMR-4.5,Hadoop 2.7.2,Pig 0.14
我想在使用-tagFile选项加载后将文件名字段和选定字段投影到新关系。结果似乎没有意义。例子:
tagfile-test.txt(制表符分隔)
AAA 123 2016
BBB 456 2016
CCC 789 2016
抛负载
test = LOAD 'tagfile-test.txt' USING PigStorage('\t','-tagFile') AS (f0, f1, f2, f3);
DUMP test;
(tagfile-test.txt,AAA,123,2016)
(tagfile-test.txt,BBB,456,2016)
(tagfile-test.txt,CCC,789,2016)
正确 - 生成f0,f1,f2
test = LOAD 'tagfile-test.txt' USING PigStorage('\t','-tagFile') AS (f0, f1, f2, f3);
project = FOREACH test GENERATE f0, f1, f2;
DUMP project;
(tagfile-test.txt,AAA,123)
(tagfile-test.txt,BBB,456)
(tagfile-test.txt,CCC,789)
不正确 - 生成f0,f1,f3(结果与上面相同)
test = LOAD 'tagfile-test.txt' USING PigStorage('\t','-tagFile') AS (f0, f1, f2, f3);
project = FOREACH test GENERATE f0, f1, f3;
DUMP project;
(tagfile-test.txt,AAA,123)
(tagfile-test.txt,BBB,456)
(tagfile-test.txt,CCC,789)
不正确 - 生成f0,f2,f3(确认)
test = LOAD 'tagfile-test.txt' USING PigStorage('\t','-tagFile') AS (f0, f1, f2, f3);
project = FOREACH test GENERATE f0, f2, f3;
DUMP project;
(tagfile-test.txt,AAA,2016)
(tagfile-test.txt,BBB,2016)
(tagfile-test.txt,CCC,2016)
看来Pig没有正确识别字段名称。我尝试使用相同结果的字段位置($ 0,$ 1,$ 2,$ 3)。
答案 0 :(得分:1)
在使用带有pigstorage的tagFile选项时遇到了同样的问题,并通过在pig脚本中添加以下行来解决问题:
得到了很好的解释设置pig.optimizer.rules.disabled'LegMapKeyPrune';
答案 1 :(得分:0)
看起来这些字段用','分隔,但你使用'\ t'作为PigStorage中的分隔符。还要指定字段的数据类型。
尝试更改此
test = LOAD 'tagfile-test.txt' USING PigStorage('\t','-tagFile') AS (f0, f1, f2, f3);
要
test = LOAD 'tagfile-test.txt' USING PigStorage(',','-tagFile') AS (f0:chararray, f1:chararray, f2:int, f3:int);