Hadoop - MapReduce没有减少

时间:2016-04-16 11:10:58

标签: java hadoop

我正在尝试缩小这样的地图:

01 true
01 true
01 false
02 false
02 false

其中第一列是Text,第二列是BooleanWritable。目的是仅保留那些仅包含false的键,然后写入第一列数字对(因此上面输入的输出将为0, 2)。为此,我写了以下减速器:

import java.io.IOException;

import org.apache.hadoop.io.BooleanWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class BeadReducer extends Reducer<Text, Text, Text, Text> {

    public void reduce(Text _key, Iterable<BooleanWritable> values, Context context) throws IOException, InterruptedException {
        // process values
        boolean dontwrite= false;
        for (BooleanWritable val : values) {
            dontwrite = (dontwrite || val.get());
        }
        if (!dontwrite) {
            context.write(new Text(_key.toString().substring(0,1)), new Text(_key.toString().substring(1,2)));
        }
        else {
            context.write(new Text("not"), new Text("good"));
        }

    }

}

然而,这没有任何作用。它也没有写对,而不是"not good",好像它甚至没有进入if-else分支。我得到的只是映射(映射按预期工作)值。

司机:

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.BooleanWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class BeadDriver {

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "task2");
        job.setJarByClass(hu.pack.task2.BeadDriver.class);
        // TODO: specify a mapper
        job.setMapperClass(hu.pack.task2.BeadMapper.class);
        // TODO: specify a reducer
        job.setReducerClass(hu.pack.task2.BeadReducer.class);

        // TODO: specify output types
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(BooleanWritable.class);

        // TODO: specify input and output DIRECTORIES (not files)
        FileInputFormat.setInputPaths(job, new Path("local"));
        FileOutputFormat.setOutputPath(job, new Path("outfiles"));

        FileSystem fs;
        try {
            fs = FileSystem.get(conf);
            if (fs.exists(new Path("outfiles")))
                fs.delete(new Path("outfiles"),true);
        } catch (IOException e1) {
            e1.printStackTrace();
        }

        if (!job.waitForCompletion(true))
            return;
    }

}

映射器:

import java.io.IOException;

import org.apache.hadoop.io.BooleanWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class BeadMapper extends Mapper<LongWritable, Text, Text, BooleanWritable > {
    private final Text wordKey = new Text("");

    public void map(LongWritable ikey, Text value, Context context) throws IOException, InterruptedException {
        String[] friend = value.toString().split(";");
        String[] friendswith = friend[1].split(",");
        for (String s : friendswith) {
            wordKey.set(friend[0] + s);
            context.write(wordKey, new BooleanWritable(true));
            wordKey.set(s + friend[0]);
            context.write(wordKey, new BooleanWritable(true));
        }
        if (friendswith.length > 0) {
            for(int i = 0; i < friendswith.length-1; ++i) {
                for(int j = i+1; j < friendswith.length; ++j) {
                    wordKey.set(friendswith[i] + friendswith[j]);
                    context.write(wordKey, new BooleanWritable(false));
                }
            }
        }
    }

}

我想知道问题是什么,我错过了什么?

1 个答案:

答案 0 :(得分:0)

映射器的输出键和值类型应该是reducer的输入类型,因此在您的情况下,reducer必须继承自

Reducer<Text, BooleanWritable, Text, BooleanWritable>

setOutputKeyClasssetOutputValueClass设置作业输出的类型,即map和reduce。如果要为映射器指定其他类型,则应使用方法setMapOutputKeyClasssetMapOutputValueClass

作为旁注,当您不希望输出中的真值时,为什么要从映射器中发出它们。还有以下代码在reducer中,

for (BooleanWritable val : values) {
    dontwrite = (dontwrite || val.get());
}

如果dontwrite变为true一旦它在循环结束时为真。您可能希望更改逻辑以进行优化。