我正在尝试存储一组这样的记录:
2342514224232 | some text here whatever
2342514224234| some more text here whatever
.... 在输出文件夹中的单独文件中,如下所示:
输出/ 2342514224232 输出/ 2342514224234
idstr的值应该是文件名,文本应该在文件中。这是我的猪代码:
REGISTER /home/bytebiscuit/pig-0.11.1/contrib/piggybank/java/piggybank.jar;
A = LOAD 'cleantweets.csv' using PigStorage(',') AS (idstr:chararray, createdat:chararray, text:chararray,followers:int,friends:int,language:chararray,city:chararray,country:chararray,lat:chararray,lon:chararray);
B = FOREACH A GENERATE idstr, text, language, country;
C = FILTER B BY (country == 'United States' OR country == 'United Kingdom') AND language == 'en';
texts = FOREACH C GENERATE idstr,text;
STORE texts INTO 'output/query_results_one' USING org.apache.pig.piggybank.storage.MultiStorage('output/query_results_one', '0');
运行这个pig脚本会出现以下错误:
<file pigquery1.pig, line 12, column 0> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.MultiStorage' with arguments '[output/query_results_one, idstr]'
非常感谢任何帮助!
答案 0 :(得分:1)
尝试此选项:
MultiStorage('output/query_results_one', '0', 'none', ',');
答案 1 :(得分:0)
如果有人像我一样偶然发现这个帖子,我的问题就是我的猪脚本看起来像:
DEFINE MultiStorage org.apache.pig.piggybank.storage.MultiStorage();
...
STORE stuff INTO 's3:/...' USING MultiStorage('s3:/...','0','none',',');
DEFINE语句错误地未指定输入/输出。前面的DEFINE语句并直接解决了以下问题。
STORE stuff INTO 's3:/...' USING org.apache.pig.piggybank.storage.MultiStorage('s3:/...','0','none',',');