我有一些数据,我想使用Mapper代码按密钥聚合,然后使用Reducer代码对属于某个键的所有值执行某些操作。例如,如果我有:
key = 1,val = 1,
key = 1,val = 2,
key = 1,val = 3
我想在我的Reducer中得到key = 1,val = [1,2,3]。
问题是,我得到像
这样的东西key = 1,val = [1,2]
key = 1,val = [3]
为什么会这样?
我认为一个特定键的所有值都将在一个reducer中组合,但现在似乎可以有更多的key,val []对,因为可以有多个reducers,是这样吗?
我应该将减速器的数量设置为1吗?
我是Hadoop的新手,所以这让我很困惑。
这是代码
public class SomeJob {
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException
{
Job job = new Job();
job.setJarByClass(SomeJob.class);
FileInputFormat.addInputPath(job, new Path("/home/pera/data/input/some.csv"));
FileOutputFormat.setOutputPath(job, new Path("/home/pera/data/output"));
job.setMapperClass(SomeMapper.class);
job.setReducerClass(SomeReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.waitForCompletion(true);
}
}
public class SomeMapper extends Mapper<LongWritable, Text, Text, Text>{
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String parts[] = line.split(";");
context.write(new Text(parts[0]), new Text(parts[4]));
}
}
public class SomeReducer extends Reducer<Text, Text, Text, Text>{
@Override
protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
String properties = "";
for(Text value : values)
{
properties += value + " ";
}
context.write(key, new Text(properties));
}
}