我正在尝试在pig latin脚本中加载数据文件, 数据有2列,但第2列中有文本限定符,示例数据如下:
DEVICE_ID,SUPPORTED_TECH
a2334,"GSM900,GSM1500,GSM200"
a54623,"GSM900,GSM1500"
a86646,"GSM1500,GSM200"
当我尝试按如下所示加载日期时,第二列不会被识别为1列
deviceList = load 'deviceList.csv' Using PigStorage(',') as (DEVICE_ID:chararray, SUPPORTED_TECH:chararray );
如何在加载数据集时定义文本限定符?
答案 0 :(得分:1)
试试这个,如果你需要不同的输出格式,请告诉我
<强> input.txt中强>
DEVICE_ID,SUPPORTED_TECH
a2334,"GSM900,GSM1500,GSM200"
a54623,"GSM900,GSM1500"
a86646,"GSM1500,GSM200
<强> PigScript:强>
A = LOAD 'input.txt' AS line;
deviceList = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'^(\\w+),(.*)$')) as (DEVICE_ID:chararray, SUPPORTED_TECH:chararray );
DUMP deviceList;
<强>输出:强>
(DEVICE_ID,SUPPORTED_TECH)
(a2334,"GSM900,GSM1500,GSM200")
(a54623,"GSM900,GSM1500")
(a86646,"GSM1500,GSM200")