从Google Cloud Dataproc提交Pig作业不会将自定义jar添加到Pig类路径

时间:2017-03-29 16:36:44

标签: apache-pig google-cloud-dataproc google-cloud-bigtable

我试图通过Google Cloud Dataproc提交Pig工作,并包含一个自定义jar,它实现了我在Pig脚本中使用的自定义加载功能,但我无法了解如何执行此操作。

通过UI添加我的自定义jar似乎不会将其添加到Pig类路径。

这里是Pig工作的输出,显示它找不到我的课程:

17/03/29 16:12:21 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
17/03/29 16:12:21 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
17/03/29 16:12:21 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2017-03-29 16:12:21,961 [main] INFO  org.apache.pig.Main - Apache Pig version 0.16.0 (r: unknown) compiled Nov 27 2016, 23:14:51
2017-03-29 16:12:21,961 [main] INFO  org.apache.pig.Main - Logging error messages to: /tmp/cb3b0696-3f30-4db4-a6a7-bb716d2a8a89/pig_1490803941959.log
2017-03-29 16:12:22,379 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2017-03-29 16:12:22,379 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2017-03-29 16:12:22,379 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://aspen-dp-central-m
2017-03-29 16:12:22,404 [main] INFO  com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase - GHFS version: 1.6.0-hadoop2
2017-03-29 16:12:22,890 [main] INFO  org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-e53a2851-efe5-4e74-bf33-89dfe0733386
2017-03-29 16:12:22,890 [main] WARN  org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false
2017-03-29 16:12:23,247 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Failed to parse: Pig script failed to parse: 
<line 8, column 13> pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
    at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:199)
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1819)
    at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1527)
    at org.apache.pig.PigServer.parseAndBuild(PigServer.java:460)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:485)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:471)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:172)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:742)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:231)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:206)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
    at org.apache.pig.Main.run(Main.java:532)
    at org.apache.pig.Main.main(Main.java:176)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: 
<line 8, column 13> pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
    at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1339)
    at org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:1324)
    at org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:5184)
    at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3515)
    at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625)
    at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
    at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
    at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
    at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
    ... 19 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
    at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:671)
    at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1336)
    ... 27 more
2017-03-29 16:12:23,251 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Details at logfile: /tmp/cb3b0696-3f30-4db4-a6a7-bb716d2a8a89/pig_1490803941959.log
2017-03-29 16:12:23,269 [main] INFO  org.apache.pig.Main - Pig script completed in 1 second and 477 milliseconds (1477 ms)
Job output is complete

1 个答案:

答案 0 :(得分:1)

在Pig脚本中注册自定义jar解决了这个问题。 所以,基本上:

  1. 将我的jar文件添加到Google存储
  2. 在剧本
  3. 中注册了jar
  4. 通过以下UI或命令行提交Pig作业:
  5. gcloud dataproc jobs提交pig --cluster eduboom-central --file custom.pig  --jars = GS://eduboom-dataproc/custom/eduboom.jar

    custom.pig:

    register eduboom.jar;
    raw = LOAD 'hbase://eduboom_table'
       USING com.eduboom.pig.load.HBaseMultiScanLoader('2017-03-30T14:00Z_00', '2017-03-30T14:01Z_25', 'cf:*')
       AS (key:chararray, data);
    DUMP raw;