在PigLatin中加载csv文件

时间:2015-08-21 10:47:27

标签: csv apache-pig

我正在尝试在PigLatin中加载csv文件。记录格式如下: "ABBOTT,DEEDEE W",GRADES 9-12 TEACHER,"52,122.10",0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010

我尝试了以下代码:

A = LOAD '/user/hduser/salaryTravel.csv' using PigStorage(',')  AS (name:chararray,job:chararray,salary:float,TA:float,type:chararray,org:chararray,year:int);

但输出如下:

("ABBOTT,DEEDEE W",,,122.10",0,)

name字段被读取为单独的字段,因为名称字段包含逗号(',')。我该如何阅读此记录?

1 个答案:

答案 0 :(得分:2)

建议使用CSVExcelStorage或CSVLoader API来加载数据。

REGISTER piggybank.jar;

A = LOAD '/user/hduser/salaryTravel.csv' using org.apache.pig.piggybank.storage.CSVExcelStorage()  AS (name:chararray,job:chararray,salary:float,TA:float,type:chararray,org:chararray,year:int);

REGISTER piggybank.jar;

A = LOAD '/user/hduser/salaryTravel.csv' using org.apache.pig.piggybank.storage. CSVLoader()  AS (name:chararray,job:chararray,salary:float,TA:float,type:chararray,org:chararray,year:int);

参考:REGEX_EXTRACT error in PIG,共享了一些代码示例。