大家好我对使用apache pig加载数据有疑问,文件格式如下:
"1","2","xx,yy","a,sd","3"
所以我想通过使用多个分隔符","
2个双引号和一个逗号来加载它:
A = LOAD 'file.csv' USING PigStorage('","') AS (f1,f2,f3,f4,f5);
但是PigStorage不接受多个分隔符","
。我怎么能这样做?非常感谢你!
答案 0 :(得分:0)
PigStorage将单个字符作为分隔符。您将使用PiggyBank中的内置函数。下载piggybank.jar并保存在与pigcript相同的文件夹中。在你的pigcript中保存jar。
REGISTER piggybank.jar;
DEFINE CSVLoader org.apache.pig.piggybank.storage.CSVLoader();
A = LOAD 'test1.txt' USING CSVLoader(',') AS (f1:int,f2:int,f3:chararray,f4:chararray,f5:int);
B = FOREACH A GENERATE f1,f2,f3,f4,f5;
DUMP B;
备用选项是将数据加载到一行中,然后使用STRSPLIT
A = LOAD 'test1.txt' USING TextLoader() AS (line:chararray);
B = FOREACH A GENERATE FLATTEN(STRSPLIT(line, '","'));
DUMP B;