我有一个CSV文件,其中包含文本限定符(“”)数据。我想使用没有文本限定符的PIG / Hive / Hbase将数据加载到hdfs中。请你帮忙
my file input.CSV
"Id","Name"
"1","Raju"
"2","Anitha"
"3","Rakesh"
我希望输出如下:
Id,Name
1,Raju
2,Anitha
3,Rakesh
答案 0 :(得分:0)
在猪脚本中试试这个
假设您的输入文件名是 input.csv
1.首先使用copyfromlocal命令将此输入文件移动到HDFS 2.在猪脚本下面运行
<强> PigScript:强>
HDFS模式:
A = LOAD 'hdfs://<hostname>:<port>/user/test/input.csv' AS line;
B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'"(.*)","(.*)"')) AS (id:int,name:chararray);
STORE B INTO '/user/test/output' USING PigStorage(',');
本地模式:
A = LOAD 'input.csv' AS line;
B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'"(.*)","(.*)"')) AS (id:int,name:chararray);
STORE B INTO 'output' USING PigStorage(',');
<强>输出:强>
Id,Name
1,Raju
2,Anitha
3,Rakesh