如何将PIG输出存储为Ctrl-a分隔输出以存储到配置单元?
答案 0 :(得分:6)
要获得预期结果,您可以按照下面提到的流程进行操作 使用以下命令存储您的关系
STORE <Relation> INTO '<file_path>' USING PigStorage('\u0001');
参考生成的文件
公开hive表hive>CREATE EXTERNAL TABLE TEMP(
c1 INT,
c2 INT,
c3 INT,
c4 INT
.....
)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\001'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '<file_path>';
如果输出文件存在于linux本地目录中,则创建表
hive>CREATE TABLE TEMP(
c1 INT,
c2 INT,
c3 INT,
c4 INT
.....
)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\001'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
并将数据加载到表
中hive> load data local inpath '<file_path>' into table temp;
答案 1 :(得分:1)
你能这样试试吗?
STORE <OutpuRelation> INTO '<Outputfile>' USING PigStorage('\u0001');
Example:
input.txt
1,2,3,4
5,6,7,8
9,10,11,12
PigScript:
A = LOAD 'input.txt' USING PigStorage(',');
STORE A INTO 'out' USING PigStorage('\u0001');
Output:
1^A2^A3^A4
5^A6^A7^A8
9^A10^A11^A12
更新:
上面的猪脚本输出存储在文件名&#39; part-m-00000&#39;我试图将此文件加载到配置单元。一切正常,我没有看到任何问题。
hive> create table test_hive(f1 INT,f2 INT,f3 INT,f4 INT);
OK
Time taken: 0.154 seconds
hive> load data local inpath 'part-m-00000' overwrite into table test_hive;
OK
Time taken: 0.216 seconds
hive> select *from test_hive;
OK
1 2 3 4
5 6 7 8
9 10 11 12
Time taken: 0.076 seconds
hive>