使用MultiStorage将记录存储在单独的文件中

时间:2013-12-12 17:51:43

标签: hadoop apache-pig

我正在尝试存储一组这样的记录:

2342514224232 | some text here whatever
2342514224234| some more text here whatever

.... 在输出文件夹中的单独文件中,如下所示:

输出/ 2342514224232 输出/ 2342514224234

idstr的值应该是文件名,文本应该在文件中。这是我的猪代码:

REGISTER /home/bytebiscuit/pig-0.11.1/contrib/piggybank/java/piggybank.jar;

A = LOAD 'cleantweets.csv' using PigStorage(',') AS (idstr:chararray, createdat:chararray, text:chararray,followers:int,friends:int,language:chararray,city:chararray,country:chararray,lat:chararray,lon:chararray); 

B = FOREACH A GENERATE idstr, text, language, country;

C = FILTER B BY (country == 'United States' OR country == 'United Kingdom') AND language == 'en';

texts = FOREACH C GENERATE idstr,text;

STORE texts INTO 'output/query_results_one' USING org.apache.pig.piggybank.storage.MultiStorage('output/query_results_one', '0');

运行这个pig脚本会出现以下错误:

<file pigquery1.pig, line 12, column 0> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.MultiStorage' with arguments '[output/query_results_one, idstr]'

非常感谢任何帮助!

2 个答案:

答案 0 :(得分:1)

尝试此选项:

 MultiStorage('output/query_results_one', '0', 'none', ',');

答案 1 :(得分:0)

如果有人像我一样偶然发现这个帖子,我的问题就是我的猪脚本看起来像:

DEFINE MultiStorage org.apache.pig.piggybank.storage.MultiStorage();
...
STORE stuff INTO 's3:/...' USING MultiStorage('s3:/...','0','none',',');

DEFINE语句错误地未指定输入/输出。前面的DEFINE语句并直接解决了以下问题。

STORE stuff INTO 's3:/...' USING org.apache.pig.piggybank.storage.MultiStorage('s3:/...','0','none',',');