远程运行Hadoop mapreduce作业会导致EOFException?

时间:2012-11-06 17:59:51

标签: java hadoop mapreduce remote-access cloudera

我已经编写了一个Hadoop map-reduce程序,现在我想在Cloadera Hadoop distribution的同一台计算机上运行的virtual box上进行测试。

以下是我如何提交map-reduce作业:

public class AvgCounter extends Configured implements Tool{

    public int run(String[] args) throws Exception {
        Job mrJob = Job.getInstance(new Cluster(getConf()), getConf()); 
        mrJob.setJobName("Average count");

        mrJob.setJarByClass(AvgCounter.class);
        mrJob.setOutputKeyClass(IntWritable.class);
        mrJob.setOutputValueClass(Text.class);
        mrJob.setMapperClass(AvgCounterMap.class);
        mrJob.setCombinerClass(AvgCounterReduce.class);
        mrJob.setReducerClass(AvgCounterReduce.class);
        mrJob.setInputFormatClass(TextInputFormat.class);
        mrJob.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.setInputPaths(mrJob, new Path("/user/test/testdata.csv"));
        FileOutputFormat.setOutputPath(mrJob, new Path("/user/test/result.txt"));
        mrJob.setWorkingDirectory(new Path("/tmp"));
        return mrJob.waitForCompletion(true)? 1: 0;
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS", "hdfs://192.168.5.50:9000");
        conf.set("mapreduce.jobtracker.address", "192.168.5.50:9001");
        System.exit(ToolRunner.run(conf, new AvgCounter(), args));
    }
}

AvgCounterMap有空的map方法,除了AvgCounterReducereduce方法无效之外什么都不做。当我尝试运行main方法时,我得到以下异常:

Exception in thread "main" java.io.IOException: Call to /192.168.5.50:9001 failed on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:1063)
    at org.apache.hadoop.ipc.Client.call(Client.java:1031)
    at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
    at $Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:235)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:275)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:249)
    at org.apache.hadoop.mapreduce.Cluster.createRPCProxy(Cluster.java:86)
    at org.apache.hadoop.mapreduce.Cluster.createClient(Cluster.java:98)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:74)
    at eu.xxx.mapred.AvgCounter.run(AvgCounter.java:22)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
    at eu.xxx.mapred.AvgCounter.main(AvgCounter.java:53)
Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:760)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:698) 

运行Hadoop的虚拟Cloudera计算机在文件/etc/hadoop/conf/core.site.xml

中有以下内容
<property>
    <name>fs.default.name</name>
    <value>hdfs://192.168.5.50:9000</value>
</property> 

并在文件/etc/hadoop/conf/mapred.site.xml中有

<property>
     <name>mapred.job.tracker</name>
     <value>192.168.5.50:9001</value>
</property>

我还通过将92.168.5.50:50030写入我的网络浏览器检查了与虚拟机的已检查连接,并按预期获得了Hadoop Map / Reduce管理。那么是什么导致了这个例外,我该如何摆脱它呢?

感谢您提出任何想法

1 个答案:

答案 0 :(得分:0)

问题是客户端使用的是不同版本的Hadoop API(0.23.0),然后是Hadoop安装。