我想知道有一些方法我只能选择MapReduce(Hadoop)

时间:2018-04-11 03:13:52

标签: hadoop mapreduce hdfs

我低于表示标题(关键)和月份(关键)组合的标题,月份和值(总和)的值。我想在标题,月份和值中只选择一行具有最高价值的行,例如," Fly 08(09,11)4或Go 06 45,正如您在我的实际输出中所看到的那样。如果有可能,请赐教。如果您有任何疑问,请告诉我,我会尽力澄清。

Fly,07,1
Fly,08,4
Fly,09,4
Fly,10,1
Fly,11,4
Fly,12,2
Gentle Ben,05,2
Gentle Ben,06,3
Gentle Ben,07,2
Gentle Ben,08,2
Gentle Ben,09,2
German aircraft guns and cannons,11,1
Go,04,20
Go,05,29
Go,06,45
Go,07,24
Go,08,28
Go,09,37

2 个答案:

答案 0 :(得分:1)

您需要将第一列作为键值发送到reducer,将剩余的两列作为值发送到reducer,以便所有以相同键开头的行应该转到相同的reducer以获取最大值。在reducer中,遍历每一行并检查最终值。如果没有多行具有最大值,则第二列中只有一个值,否则附加所有这些值。以下是您的知识代码。

public class MaxValueGroupedMapper extends Mapper<LongWritable, Text, Text, Text> {

@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

    String lines = value.toString();
    String[] val = lines.split(",");

    context.write(new Text(val[0]), new Text(val[1] + "," + val[2]));

 }
}

public class MaxValueGroupedReducer extends Reducer<Text,Text,Text,Text>{

@Override
public void reduce(Text key,Iterable<Text> values, Context context) throws IOException, InterruptedException{

    int max = 0;
    String val = null;
    Iterator it = (Iterator) values.iterator();

    for(Text txt : values){

        String st[] = txt.toString().split(",");
        int data = new Integer(st[1]);
        if(data > max){
            max = data;
            val = st[0];
        }else if (data == max){
            val  = val +"," + st[0];
        }
    }
    Text output = new Text(val+","+max);

    context.write(key, output);
 }
}

public class MaxValueGroupedDriver {

public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

    Configuration conf = new Configuration();
    conf.set("mapreduce.job.queuename", "default");
    Job job = new Job(conf,"MaxValue");

    job.setJarByClass(MaxValueGroupedDriver.class);
    job.setMapperClass(MaxValueGroupedMapper.class);
    job.setReducerClass(MaxValueGroupedReducer.class);

    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.waitForCompletion(true);
}
}

以上数据集的输出

Fly 08,09,10,4
Gentle Ben,06,3
German aircraft guns and cannons,11,1
Go,06,45

答案 1 :(得分:0)

你可以读取mapper中的值,并在reducer中计算最大值,如下所示:

public class MaxTileValue {

    public static class MaxTileValueMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String row[] = value.toString().split(",");
            if (row.length == 3) {
                String tile = row[0];
                String val = row[2];
                context.write(new Text(tile), new IntWritable(Integer.parseInt(val)));
            }
        }
    }

    public static class MaxTileValueReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

        protected void reduce(Text key, Iterable<IntWritable> values, Context context)
                throws IOException, InterruptedException {
            int max = StreamSupport.stream(values.spliterator(), false)
                    .mapToInt(IntWritable::get)
                    .max()
                    .orElse(0);
            context.write(key, new IntWritable(max));
        }
    }

    public static void main(String[] args) throws Exception {
        Job job = Job.getInstance(new Configuration(), "MaxTileValue");

        job.setMapperClass(MaxTileValueMapper.class);
        job.setReducerClass(MaxTileValueReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

MaxTileValueMapper逐行读取输入文件。键是行号,值是内容。 Maper拆分值,并将tile名称及其值写入上下文。然后MaxTileValueReducer接收一个键(tile名称)和mapper写入的值列表,然后计算最大值。

另外,您应该将输入调整为可解析,例如使用csv格式:

Fly,07,1
Fly,08,4
Fly,09,4
Fly,10,1
Fly,11,4
Fly,12,2
Gentle Ben,05,2
Gentle Ben,06,3
Gentle Ben,07,2
Gentle Ben,08,2
Gentle Ben,09,2
German aircraft guns and cannons,11,1
Go,04,20
Go,05,29
Go,06,45
Go,07,24
Go,08,28
Go,09,37

上面这个csv的mapreduce作业的输出是:

Fly     4
Gentle Ben      3
German aircraft guns and cannons        1
Go      45