如何将猪输出储存到蜂巢桌?

时间:2017-05-02 06:37:56

标签: csv azure hadoop apache-pig

我在Azure上有HDInsight群集,在hdfs(Azure存储)中有.csv个文件。

使用apache-pig我想处理这些文件并将输出存储在hive表中。为实现这一点,我写了以下脚本:

A = LOAD '/test/input/t12007.csv' USING PigStorage(',') AS (year:chararray,ArrTime:chararray,DeptTime:chararray);
describe A;
dump A;
store A into 'testdb.tbl3' using org.apache.hive.hcatalog.pig.HCatStorer();

此脚本成功加载文件,描述结构并使用转储显示数据但执行store命令时会抛出以下错误:

2017-05-02 06:18:41,476 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Failed to parse: Pig script failed to parse: <file script.pig, line 4, column 33> pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] 
Caused by: <file script.pig, line 4, column 33> pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] 
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] 
2017-05-02 06:18:41,484 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] 

1 个答案:

答案 0 :(得分:0)

pig -useHCatalog

来自Pig HCatalog documentation

使用HCatalog运行Pig

Pig不会自动获取HCatalog个罐子。要引入必要的jar,您可以在pig命令中使用标志,也可以设置环境变量PIG_CLASSPATHPIG_OPTS,如下所述。要引入适当的jar来处理HCatalog,只需在脚本中包含以下标志:

替代方式:

指定HCatalog jar的位置,并将带有jar路径的REGISTER语句添加到脚本的顶部,如下所示。

REGISTER /usr/username/client/lib/hive-hcatalog-core-1.2.1.2.3.0.0-2557.jar;

根据群集中的安装,您的路径可能会有所不同。您可以使用以下命令找到此jar位置:locate *hcatalog-core*

<强> HCatStorer

HCatStorer与Pig脚本一起用于将数据写入HCatalog-managed表。

<强>用法

通过Pig商店声明访问

HCatStorer

STORE A INTO 'tablename'
   USING org.apache.hive.hcatalog.pig.HCatStorer();