如何使用apache pig构建非结构化数据

时间:2016-03-11 10:45:47

标签: hadoop apache-pig

我有一个包含以下行的文件:

3124,"hello...",ku4
3125,"hello,hi",ab2

我想加载文件,使其有三列。我使用PigStorage(','),但它也将"hello,hi"拆分为两个。我希望它在一个领域内。

我怎样才能做到这一点?

1 个答案:

答案 0 :(得分:0)

您可以编写自己的自定义UDF或使用piggybank.jar

中的CSVLoader
-- Get piggybank.jar that is compatible with your pig version and register 
   it in your pig script by pointing to the location of the jar file

REGISTER piggybank.jar

A = LOAD 'test.txt' USING org.apache.pig.piggybank.storage.CSVLoader(',') AS (f1:int,f2:chararray,f3:chararray);
B = FOREACH A GENERATE f1, f2, f3;
DUMP B;