Map Reduce:Wordcount不做任何事情

时间:2014-03-03 16:59:28

标签: java hadoop mapreduce java-6

我想用MapReduce和hadoop v.1.0.3(我在MacOS上)写自己的单词计数示例,但我不明白为什么它不起作用 分享我的代码:

主要

package org.myorg;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class WordCount {
    public static void main(String[] args) throws Exception {
        JobConf conf = new JobConf(WordCount.class);
        // set job name, mapper, combiner, and reducer classes
        conf.setJobName("WordCount");
        // set input and output formats
        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);
        // set input and output paths
        //FileInputFormat. setInputPaths(conf, new Path(input));
        //FileOutputFormat.setOutputPath(conf, new Path(output));
        FileOutputFormat.setCompressOutput(conf, false);
        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);
        conf.setMapperClass(org.myorg.Map.class);
        conf.setReducerClass(org.myorg.Reduce.class);
        String host = args[0];
        String input = host + "/" + args[1];
        String output = host + "/" + args[2];
        // set input and output paths
        FileInputFormat.addInputPath(conf, new Path(input));
        FileOutputFormat.setOutputPath(conf, new Path(output));
        JobClient j=new JobClient(conf);
        (j.submitJob(conf)).waitForCompletion();
    }
}

映射器:

package org.myorg;

import java.io.IOException;
import java.util.HashMap;
import java.util.StringTokenizer;
import java.util.TreeMap;
import java.util.Vector;
import java.util.Map.Entry;

import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapreduce.Mapper.Context;

public class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>  {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    @Override
    public void map(LongWritable key, Text value,
    OutputCollector<Text, IntWritable> output, Reporter reporter)
    throws IOException {

        MapWritable hs = new MapWritable();
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            word.set(tokenizer.nextToken());
            //hs.put(word, one);
            output.collect(word,one);
        }
        // TODO Auto-generated method stub
    }
}

减速机:

package org.myorg;

import java.io.BufferedWriter;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.math.RoundingMode;
import java.net.URI;
import java.net.URISyntaxException;
import java.text.DecimalFormat;
import java.text.DecimalFormatSymbols;
import java.text.NumberFormat;
import java.text.ParseException;
import java.util.Iterator;
import java.util.Map;
import java.util.TreeMap;
import java.util.Map.Entry;
import java.util.Vector;

import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapreduce.Reducer.Context;


//public class Reduce extends MapReduceBase implements Reducer<Text, MapWritable, Text, Text> {
public class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, Text> {
    public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
        int sum = 0;
        while (values.hasNext()) {
            sum += values.next().get();
        }
        String host = "hdfs://localhost:54310/";
        String tmp = host + "Temporany/output.txt";
        FileSystem srcFS;
        try {
            srcFS = FileSystem.get(new URI(tmp), new JobConf());
            srcFS.delete(new Path(tmp), true);
            BufferedWriter wr = new BufferedWriter(new OutputStreamWriter(
            srcFS.create(new Path(tmp))));
            wr.write(key.toString() + ":" + sum);
            wr.close();
            srcFS.close();
        } catch (URISyntaxException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        //   context.write(key, new IntWritable(sum));
    }
}

作业开始并以无错误结束,但未写入输出文件。 我用这个命令用Hadoop启动jar:

./Hadoop jar /Users/User/Desktop/hadoop/wordcount.jar hdfs://localhost:54310 /In/testo.txt /Out/wordcount17

这是输出:

2014-03-03 17:56:22.063 java[6365:1203] Unable to load realm info from SCDynamicStore
14/03/03 17:56:22 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/03/03 17:56:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/03/03 17:56:23 WARN snappy.LoadSnappy: Snappy native library not loaded
14/03/03 17:56:23 INFO mapred.FileInputFormat: Total input paths to process : 1

我认为问题是“无法加载native-hadoop库”但对其他Jar工作正常。

1 个答案:

答案 0 :(得分:2)

问:作业开始和结束没有错误但是没有写输出文件??

Ans:我不确定该工作是否成功结束没有错误。

问题:

  • 作业配置

    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(IntWritable.class);
    
  

setOutputKeyClass()和setOutputValueClass()方法控制   map和reduce函数的输出类型,通常是   相同。如果它们不同,那么地图   可以使用setMapOutputKeyClass()和方法设置输出类型   setMapOutputValueClass()。

您的案例中的输出类是:

Map key : Text
Map Value : IntWritable 
Reduce key : Text 
Reduce Value : Text

这将导致Type mismatch exception

  • 减少

我不确定您为何使用hdfs API将输出写入文件?

应该使用output.collect(key,value)

如果有多个reducer,您是否正在处理同时写入操作? 我想知道context.write在旧的apis中做了什么(它的注释)?


您可以使用以下内容获取更多信息

调试map-reduce作业:


问。差异b / w SubmitJob()&amp;&amp; waitForCompletion()?

Ans:SubmitJob():提交作业并结束。

waitForCompletion():提交作业并在控制台上打印作业的状态。 所以waitForCompletion()是SubmitJob()+作业状态更新,直到完成。


字数

请阅读

  • Map Reduce Apache
  • 您还可以在安装文件夹中找到hadoop-examples-X.X.X.jar
  • 浏览$HADOOP_HOME/src/expalmes/获取源代码。

  

** $ HADOOP_HOME = hadoop安装文件夹