Hadoop MapReduce作业输入文件ClassNotFound

时间:2012-06-23 21:01:36

标签: hadoop mapreduce hadoop-streaming

我正在hadoop集群上构建一个示例Map / Reduce任务,该集群包含两个节点 - 主/从和从。 以下是我的规格:

$HADOOP_HOME = /usr/local/hadoop
My M/R classfiles path = $HADOOP_HOME/MyMapRed_classes
My Mapper classfile = $HADOOP_HOME/MyMapRed_classes/MyMapper
My Reducer classfile = $HADOOP_HOME/MyMapRed_classes/MyReducer
My Jar path = $HADOOP_HOME/MyMapred/MyMapRed.jar
My HDFS Input Path = /user/hadoop/MyMapRed/inputfile
My HDFS Output Path = /user/hadoop/MyMapRed_output

我正在运行M / R任务,如下所示

<myusername>@localhost:/usr/local/hadoop$ bin/hadoop jar $HADOOP_HOME/MyMapRed/MyMapRed.jar -input /user/hadoop/MyMapRed/inputfile -output /user/hadoop/MyMapRed_output/ -mapper $HADOOP_HOME/MyMapRed_classes/MyMapper -reducer $HADOOP_HOME/MyMapRed_classes/MyReducer

但似乎无法从下面的消息

中找到输入文件
Exception in thread "main" java.lang.ClassNotFoundException: -input
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

下面是我正在使用的MyMapRed类。它具有对的列表作为输入。减速器应该给出每组的平均Val。

import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;


public class MyMapRed {

public static class MyMapper extends MapReduceBase 
implements Mapper<Text, Text, Text, DoubleWritable> {

    private final static Text Group = new Text();
    private final static DoubleWritable Val = new DoubleWritable();

    public void map(Text key, Text value, OutputCollector<Text, DoubleWritable> output, Reporter reporter) 
        throws IOException {
        String line = value.toString();
        String[] KeyAndVal = line.split("\t",2);
        Group.set(KeyAndVal[0]);
        Val.set(Double.valueOf(KeyAndVal[1]));
        output.collect(Group, Val);
    }
}

public static class MyReducer extends MapReduceBase 
implements Reducer<Text, DoubleWritable, Text, DoubleWritable> {
    public void reduce(Text key, Iterator<DoubleWritable> values,
            OutputCollector<Text, DoubleWritable> output, Reporter reporter)
            throws IOException {
        DoubleWritable val = new DoubleWritable();
        double valSum = 0.0;
        int valCnt = 0;
        while (values.hasNext()) {
            val = values.next();
            valSum += val.get();
            valCnt++;
        }
        if (valCnt>0)
            valSum = valSum/valCnt;
        output.collect(key, new DoubleWritable(valSum));
    }
}


public static void main(String[] args) {
    JobClient client = new JobClient();
    JobConf conf = new JobConf(MyMapRed.class);
    conf.setJobName("MyMapRed");

    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(DoubleWritable.class);

    conf.setMapperClass(MyMapper.class);
    conf.setReducerClass(MyReducer.class);

    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);

    FileInputFormat.addInputPath(conf, new Path("input"));
    FileOutputFormat.setOutputPath(conf, new Path("output"));

    client.setConf(conf);

    try {
        JobClient.runJob(conf);
    } catch (Exception e) {
        e.printStackTrace();
    }

}
}

有人可以提出我错过了ClassNotFoundException的内容吗?

1 个答案:

答案 0 :(得分:3)

如果你的MapRed.jar文件没有在jar清单中定义的主类,那么hadoop期望jar之后的参数是一个包含静态main方法的类。

您需要命名主类(包含public static void main(String args[])方法的主类,并通过JobConf或Job类配置作业)。

如果您也尝试使用流式传输,如果您的mapper和reducer是用Java编写的话,那么这不是必需的