Hadoop并非所有值都为一个键组装

时间:2014-12-02 20:40:49

标签: hadoop mapreduce

我有一些数据,我想使用Mapper代码按密钥聚合,然后使用Reducer代码对属于某个键的所有值执行某些操作。例如,如果我有:

key = 1,val = 1,

key = 1,val = 2,

key = 1,val = 3

我想在我的Reducer中得到key = 1,val = [1,2,3]。

问题是,我得到像

这样的东西

key = 1,val = [1,2]

key = 1,val = [3]

为什么会这样?

我认为一个特定键的所有值都将在一个reducer中组合,但现在似乎可以有更多的key,val []对,因为可以有多个reducers,是这样吗?

我应该将减速器的数量设置为1吗?

我是Hadoop的新手,所以这让我很困惑。

这是代码

public class SomeJob {

public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException
 {
    Job job = new Job();
    job.setJarByClass(SomeJob.class);

    FileInputFormat.addInputPath(job, new Path("/home/pera/data/input/some.csv"));
    FileOutputFormat.setOutputPath(job, new Path("/home/pera/data/output"));

    job.setMapperClass(SomeMapper.class);
    job.setReducerClass(SomeReducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    job.waitForCompletion(true);
 }

}

public class SomeMapper extends Mapper<LongWritable, Text, Text, Text>{

@Override
 protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

    String line = value.toString();
    String parts[] = line.split(";");

    context.write(new Text(parts[0]), new Text(parts[4]));
 }

}

public class SomeReducer extends Reducer<Text, Text, Text, Text>{

@Override
 protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {


    String properties = "";

    for(Text value : values)
    {
        properties += value + " ";
    }

    context.write(key, new Text(properties));
 }

}

0 个答案:

没有答案