我低于表示标题(关键)和月份(关键)组合的标题,月份和值(总和)的值。我想在标题,月份和值中只选择一行具有最高价值的行,例如," Fly 08(09,11)4或Go 06 45,正如您在我的实际输出中所看到的那样。如果有可能,请赐教。如果您有任何疑问,请告诉我,我会尽力澄清。
Fly,07,1
Fly,08,4
Fly,09,4
Fly,10,1
Fly,11,4
Fly,12,2
Gentle Ben,05,2
Gentle Ben,06,3
Gentle Ben,07,2
Gentle Ben,08,2
Gentle Ben,09,2
German aircraft guns and cannons,11,1
Go,04,20
Go,05,29
Go,06,45
Go,07,24
Go,08,28
Go,09,37
答案 0 :(得分:1)
您需要将第一列作为键值发送到reducer,将剩余的两列作为值发送到reducer,以便所有以相同键开头的行应该转到相同的reducer以获取最大值。在reducer中,遍历每一行并检查最终值。如果没有多行具有最大值,则第二列中只有一个值,否则附加所有这些值。以下是您的知识代码。
public class MaxValueGroupedMapper extends Mapper<LongWritable, Text, Text, Text> {
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String lines = value.toString();
String[] val = lines.split(",");
context.write(new Text(val[0]), new Text(val[1] + "," + val[2]));
}
}
public class MaxValueGroupedReducer extends Reducer<Text,Text,Text,Text>{
@Override
public void reduce(Text key,Iterable<Text> values, Context context) throws IOException, InterruptedException{
int max = 0;
String val = null;
Iterator it = (Iterator) values.iterator();
for(Text txt : values){
String st[] = txt.toString().split(",");
int data = new Integer(st[1]);
if(data > max){
max = data;
val = st[0];
}else if (data == max){
val = val +"," + st[0];
}
}
Text output = new Text(val+","+max);
context.write(key, output);
}
}
public class MaxValueGroupedDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
conf.set("mapreduce.job.queuename", "default");
Job job = new Job(conf,"MaxValue");
job.setJarByClass(MaxValueGroupedDriver.class);
job.setMapperClass(MaxValueGroupedMapper.class);
job.setReducerClass(MaxValueGroupedReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
以上数据集的输出
Fly 08,09,10,4
Gentle Ben,06,3
German aircraft guns and cannons,11,1
Go,06,45
答案 1 :(得分:0)
你可以读取mapper中的值,并在reducer中计算最大值,如下所示:
public class MaxTileValue {
public static class MaxTileValueMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String row[] = value.toString().split(",");
if (row.length == 3) {
String tile = row[0];
String val = row[2];
context.write(new Text(tile), new IntWritable(Integer.parseInt(val)));
}
}
}
public static class MaxTileValueReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
protected void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int max = StreamSupport.stream(values.spliterator(), false)
.mapToInt(IntWritable::get)
.max()
.orElse(0);
context.write(key, new IntWritable(max));
}
}
public static void main(String[] args) throws Exception {
Job job = Job.getInstance(new Configuration(), "MaxTileValue");
job.setMapperClass(MaxTileValueMapper.class);
job.setReducerClass(MaxTileValueReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
MaxTileValueMapper
逐行读取输入文件。键是行号,值是内容。 Maper拆分值,并将tile名称及其值写入上下文。然后MaxTileValueReducer
接收一个键(tile名称)和mapper写入的值列表,然后计算最大值。
另外,您应该将输入调整为可解析,例如使用csv格式:
Fly,07,1
Fly,08,4
Fly,09,4
Fly,10,1
Fly,11,4
Fly,12,2
Gentle Ben,05,2
Gentle Ben,06,3
Gentle Ben,07,2
Gentle Ben,08,2
Gentle Ben,09,2
German aircraft guns and cannons,11,1
Go,04,20
Go,05,29
Go,06,45
Go,07,24
Go,08,28
Go,09,37
上面这个csv的mapreduce作业的输出是:
Fly 4
Gentle Ben 3
German aircraft guns and cannons 1
Go 45