我在Azure上有HDInsight群集,在hdfs(Azure存储)中有.csv
个文件。
使用apache-pig我想处理这些文件并将输出存储在hive表中。为实现这一点,我写了以下脚本:
A = LOAD '/test/input/t12007.csv' USING PigStorage(',') AS (year:chararray,ArrTime:chararray,DeptTime:chararray);
describe A;
dump A;
store A into 'testdb.tbl3' using org.apache.hive.hcatalog.pig.HCatStorer();
此脚本成功加载文件,描述结构并使用转储显示数据但执行store命令时会抛出以下错误:
2017-05-02 06:18:41,476 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Failed to parse: Pig script failed to parse: <file script.pig, line 4, column 33> pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Caused by: <file script.pig, line 4, column 33> pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
2017-05-02 06:18:41,484 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
答案 0 :(得分:0)
pig -useHCatalog
来自Pig HCatalog documentation
使用HCatalog运行Pig
Pig不会自动获取HCatalog
个罐子。要引入必要的jar,您可以在pig命令中使用标志,也可以设置环境变量PIG_CLASSPATH
和PIG_OPTS
,如下所述。要引入适当的jar来处理HCatalog
,只需在脚本中包含以下标志:
替代方式:
指定HCatalog
jar的位置,并将带有jar路径的REGISTER
语句添加到脚本的顶部,如下所示。
REGISTER /usr/username/client/lib/hive-hcatalog-core-1.2.1.2.3.0.0-2557.jar;
根据群集中的安装,您的路径可能会有所不同。您可以使用以下命令找到此jar位置:locate *hcatalog-core*
<强> HCatStorer 强>
HCatStorer
与Pig脚本一起用于将数据写入HCatalog-managed
表。
<强>用法强>
通过Pig商店声明访问 HCatStorer
。
STORE A INTO 'tablename'
USING org.apache.hive.hcatalog.pig.HCatStorer();