Question

问题陈述 - 找到最大值并将其与密钥一起打印

输入：

Key       Value
ABC       10
TCA       13
RTY       23
FTY       45

左侧栏中的键将是唯一的。不允许重复。

输出：

FTY       45

由于45是所有值中最高的，因此必须与键一起打印。

我根据此链接中共享的伪代码编写了MapReduce代码How to design the Key Value pairs for Mapreduce to find the maximum value in a set?

地图 -

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.Mapper;

public class Map 
            extends Mapper<LongWritable,Text,Text,IntWritable>
{

private Text maxKey = new Text();
private IntWritable maxValue = new IntWritable(Integer.MIN_VALUE);

@Override
protected void map( LongWritable key,Text value,Context context) 
                        throws IOException,InterruptedException
{
    String line = value.toString().trim();
    StringTokenizer token = new StringTokenizer(line);

    if(token.countTokens() == 2)
    {
        String str = token.nextToken();

        while(token.hasMoreTokens())
        {
            int temp = Integer.parseInt(token.nextToken());

            if(temp > maxValue.get())
            {
                maxValue.set(temp);
                maxKey.set(str);
            }
        }
    }

}

@Override
protected void cleanup(Context context)
        throws IOException,InterruptedException
{
    context.write(maxKey,maxValue);
}
}

减少

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Reducer;

public class Reduce 
                extends Reducer<Text,IntWritable,Text,IntWritable>
{

private Text maxKey = new Text();
private IntWritable maxValue = new IntWritable(Integer.MIN_VALUE);

@Override
protected void reduce(Text key,Iterable<IntWritable> values,Context context)
                                        throws IOException, 
InterruptedException
    {
        Iterator<IntWritable> itr = values.iterator();

        while(itr.hasNext())
        {
            int temp = itr.next().get();
            if(temp > maxValue.get())
            {
                maxKey.set(key);
                maxValue.set(temp);
            }
        }

    }

@Override
protected void cleanup(Context context)
        throws IOException,InterruptedException
{
    context.write(maxKey,maxValue);
}
}

驱动程序类：

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class MapReduceDriver
{
public static void main(String[] args) throws Exception
{
    Job job = new Job();

    job.setJarByClass(MapReduceDriver.class);
    job.setJobName("DNA Codon Analysis - Part 2");


    FileInputFormat.addInputPath(job,new Path(args[0]));
    FileOutputFormat.setOutputPath(job,new Path(args[1]));

    job.setMapperClass(Map.class);
    job.setReducerClass(Reduce.class);
    job.setNumReduceTasks(1);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    System.exit(job.waitForCompletion(true)?0:1);

}
}

程序编译并运行以显示此输出 -

     -2147483648

map（）和reduce（）的maxValue设置可能不正确。如何正确设置值（使用Integer.MIN_VALUE进行初始化并在比较后进行更新），以便reduce（）函数接收正确的键值对？

Answer 1

由于您的密钥始终是唯一的，因此您无法在减速器中聚合它们。因此，如果您的数据集不是非常大，您可以使用一个公共密钥从mapper写入输出，这将强制映射器的所有输出仅转到一个reducer。

然后在reducer中，您可以遍历值以进行比较，并将最大值与键一起写入。

在mapper类中，使用公共键值对

将文件写入context

public class Map extends Mapper<LongWritable,Text,Text,Text>{
private final Text commonKey = new Text("CommonKey");

    @Override
    protected void map( LongWritable key,Text value,Context context) 
                            throws IOException,InterruptedException {
        String line = value.toString().trim();
        String[] kvpair = line.split("\\s+");
        context.write(commonKey, new Text(kvpair[0] + "," + kvpair[1]));
    }
}

然后在reducer中，找到最大值并写入上下文。

public static class Reduce extends Reducer<Text, Text, NullWritable, Text>{
    private final Integer MAXIMUM_VALUE = Integer.MIN_VALUE;
    public void reduce(Text commonKey, Iterable<Text> values, Context context){
        Integer finalMax = MAXIMUM_VALUE;
        String finalKey;
        for (Text value: values){
            String[] kvpair = value.toString().trim().split(",")
            if(Integer.parseInt(kvpair[1]) > finalMax){
                finalKey = kvpair[0];
                finalMax = Integer.parseInt(kvpair[1]);
            }
        }
        context.write(new Text(finalKey), new IntWritable(finalMax) );
    }
}

预计代码中会出现一些错误。只需在文本编辑器中编写它，就可以让您对如何以不同方式处理问题有所了解。

产生垃圾值的MapReduce键值对的输出值

1 个答案: