pig latin - 用文本限定符加载

时间:2014-11-11 12:08:51

标签: text load apache-pig qualifiers

我正在尝试在pig latin脚本中加载数据文件, 数据有2列,但第2列中有文本限定符,示例数据如下:

DEVICE_ID,SUPPORTED_TECH
a2334,"GSM900,GSM1500,GSM200"
a54623,"GSM900,GSM1500"
a86646,"GSM1500,GSM200"

当我尝试按如下所示加载日期时,第二列不会被识别为1列

deviceList = load 'deviceList.csv' Using PigStorage(',') as (DEVICE_ID:chararray, SUPPORTED_TECH:chararray );

如何在加载数据集时定义文本限定符?

1 个答案:

答案 0 :(得分:1)

试试这个,如果你需要不同的输出格式,请告诉我

<强> input.txt中

DEVICE_ID,SUPPORTED_TECH
a2334,"GSM900,GSM1500,GSM200"
a54623,"GSM900,GSM1500"
a86646,"GSM1500,GSM200

<强> PigScript:

A = LOAD 'input.txt' AS line;
deviceList = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'^(\\w+),(.*)$')) as (DEVICE_ID:chararray, SUPPORTED_TECH:chararray );
DUMP deviceList;

<强>输出:

(DEVICE_ID,SUPPORTED_TECH)
(a2334,"GSM900,GSM1500,GSM200")
(a54623,"GSM900,GSM1500")
(a86646,"GSM1500,GSM200")