Question

我有一些数据，我想使用Mapper代码按密钥聚合，然后使用Reducer代码对属于某个键的所有值执行某些操作。例如，如果我有：

key = 1，val = 1，

key = 1，val = 2，

key = 1，val = 3

我想在我的Reducer中得到key = 1，val = [1,2,3]。

问题是，我得到像

这样的东西

key = 1，val = [1,2]

key = 1，val = [3]

为什么会这样？

我认为一个特定键的所有值都将在一个reducer中组合，但现在似乎可以有更多的key，val []对，因为可以有多个reducers，是这样吗？

我应该将减速器的数量设置为1吗？

我是Hadoop的新手，所以这让我很困惑。

这是代码

public class SomeJob {

public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException
 {
    Job job = new Job();
    job.setJarByClass(SomeJob.class);

    FileInputFormat.addInputPath(job, new Path("/home/pera/data/input/some.csv"));
    FileOutputFormat.setOutputPath(job, new Path("/home/pera/data/output"));

    job.setMapperClass(SomeMapper.class);
    job.setReducerClass(SomeReducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    job.waitForCompletion(true);
 }

}

public class SomeMapper extends Mapper<LongWritable, Text, Text, Text>{

@Override
 protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

    String line = value.toString();
    String parts[] = line.split(";");

    context.write(new Text(parts[0]), new Text(parts[4]));
 }

}

public class SomeReducer extends Reducer<Text, Text, Text, Text>{

@Override
 protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {


    String properties = "";

    for(Text value : values)
    {
        properties += value + " ";
    }

    context.write(key, new Text(properties));
 }

}

Hadoop并非所有值都为一个键组装

0 个答案: