使用PigServer运行mapreduce作业

时间:2014-05-05 11:16:36

标签: java hadoop apache-pig cloudera

我想使用PigServer java类在远程hadoop集群上运行pig脚本。目前我正在使用cloudera cdh4.4.0发行版测试pusposes。我在cloudera VM中运行我的java程序。

具有hadoop配置文件的目录包含在类路径中。

当我在shell中的map reduce模式下运行相同的脚本时,它可以正常工作。提前感谢您的回答。

java代码:

Properties props = new Properties();
props.setProperty("fs.default.name", "hdfs://localhost.localdomain:8020");
props.setProperty("mapred.job.tracker", "localhost.localdomain:8021");

PigServer pigServer = new PigServer(ExecType.MAPREDUCE, props);

pigServer.registerQuery("batting = load './Batting.csv' using PigStorage(',');");
pigServer.registerQuery("runs_raw = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;");
pigServer.registerQuery("runs = FILTER runs_raw BY runs > 0;");
pigServer.registerQuery("grp_data = GROUP runs by (year);");
pigServer.registerQuery("max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) as max_runs;");
pigServer.registerQuery("join_max_run = JOIN max_runs by ($0, max_runs), runs by (year,runs);");
pigServer.registerQuery("join_data = FOREACH join_max_run GENERATE $0 as year, $2 as playerID, $1 as runs;");
pigServer.store("join_data", "join_data");

我还添加了以下maven依赖

<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.2.0</version> </dependency>

运行我的java程序后,我得到以下堆栈跟踪:

log4j:WARN No appenders could be found for logger (org.apache.hadoop.conf.Configuration.deprecation).
        log4j:WARN Please initialize the log4j system properly.
        log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
        *Exception in thread "main" org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Unable to check name hdfs://localhost.localdomain:8020/user/cloudera*
        at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1608)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1547)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:518)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:531)
        at MainTestPigServer.main(MainTestPigServer.java:24)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
        Caused by: Failed to parse: Pig script failed to parse:
<line 1, column 10> pig script failed to validate: org.apache.pig.backend.datastorage.DataStorageException: ERROR 6007: Unable to check name hdfs://localhost.localdomain:8020/user/cloudera
        at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
        at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1600)
        ... 9 more
        Caused by:
<line 1, column 10> pig script failed to validate: org.apache.pig.backend.datastorage.DataStorageException: ERROR 6007: Unable to check name hdfs://localhost.localdomain:8020/user/cloudera
        at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:835)
        at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3236)
        at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1315)
        at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:799)
        at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:517)
        at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:392)
        at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
        ... 10 more
        Caused by: org.apache.pig.backend.datastorage.DataStorageException: ERROR 6007: Unable to check name hdfs://localhost.localdomain:8020/user/cloudera
        at org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:207)
        at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:128)
        at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:138)
        at org.apache.pig.parser.QueryParserUtils.getCurrentDir(QueryParserUtils.java:91)
        at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:827)
        ... 16 more
        Caused by: java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).; Host Details : local host is: "localhost.localdomain/127.0.0.1"; destination host is: "localhost.localdomain":8020;
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
        at org.apache.hadoop.ipc.Client.call(Client.java:1351)
        at org.apache.hadoop.ipc.Client.call(Client.java:1300)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at $Proxy9.getFileInfo(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at $Proxy9.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
        at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
        at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)
        at org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:200)
        ... 20 more
        Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).
        at com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
        at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
        at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.<init>(RpcHeaderProtos.java:1398)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.<init>(RpcHeaderProtos.java:1362)
    at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto$1.parsePartialFrom(RpcHeaderProtos.java:1492)
    at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto$1.parsePartialFrom(RpcHeaderProtos.java:1487)
    at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
    at com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
    at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
    at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
    at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
    at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcHeaderProtos.java:2364)
    at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:996)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)

Process finished with exit code 1

0 个答案:

没有答案