究竟什么是mapper和reducer函数的输出

时间:2016-05-07 20:05:30

标签: hadoop mapreduce hadoop2 feature-extraction mapper

这是Extracting rows containing specific value using mapReduce and hadoop的后续问题 映射器功能

public static class MapForWordCount extends Mapper<Object, Text, Text, IntWritable>{

private IntWritable saleValue = new IntWritable();
private Text rangeValue = new Text();

public void map(Object key, Text value, Context con) throws IOException, InterruptedException
{
    String line = value.toString();
    String[] words = line.split(",");
    for(String word: words )
    {
        if(words[3].equals("40")){  
            saleValue.set(Integer.parseInt(words[0]));
            rangeValue.set(words[3]);
            con.write( rangeValue , saleValue );
        }
    }
}   
}

减速机功能

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>  
{  
    private IntWritable result = new IntWritable();  
    public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException  
    {  
        for(IntWritable value : values)  
        {  
            result.set(value.get());  
            con.write(word, result);  
        }  
    }  
}

获得的输出

40 105  
40 105  
40 105  
40 105

编辑1: 但是预期的输出是

40 102  
40 104  
40 105

我做错了什么?

mapper和reducer函数到底发生了什么?

3 个答案:

答案 0 :(得分:1)

  

究竟发生了什么

您正在使用逗号分隔文本行,分割逗号并过滤掉一些值。如果你所做的只是提取那些值,那么每行只应调用Rebalance(Array S, Integer iniLen, Integer finLen) k = finLen-1 step = finLen/iniLen for j=iniLen-1 to 0: S[k] = S[j] S[j] = NONE k = k-step end for LibrarySort(Array A, Integer n, Float epsilon, Array S) goal = 1 pos = 0 sLen = (Integer)(1+epsilon)*n while pos<n://For each round do this: for i=1 to goal://Insert 'goal' elements to the sorted array S //Search a position to insert A[pos] insPos = binarySearch(A[pos], S, sLen) if not IS_EMPTY(S[insPos]): //Move elements to the right or the left in order to free //insPos freeSpace(insPos, S, sLen) end if S[insPos] = A[pos]//Insert new element pos = pos + 1 if pos>n://All elements have been inserted return LibrarySort end if end for prevLen = sLen sLen = min( (2+2*epsilon)*goal, (1+epsilon)*n ) //Rebalance array S Rebalance(S, prevLen, sLen) goal = goal * 2 end while 一次。

映射器将对所有&#34; 40&#34;您输出的键,并形成使用该键写入的所有值的列表。这就是减速器正在阅读的内容。

您应该尝试使用此地图功能。

con.write()

如果您不想要分割字符串长度的重复值,那么摆脱for循环。

你所做的所有减速器只是打印出从映射器中收到的内容。

答案 1 :(得分:1)

original question的上下文中 - 由于您要复制条目,因此您不需要不在映射器或reducer中的循环:

public static class MapForWordCount extends Mapper<Object, Text, Text, IntWritable>{

private IntWritable saleValue = new IntWritable();
private Text rangeValue = new Text();

public void map(Object key, Text value, Context con) throws IOException, InterruptedException
{
    String line = value.toString();
    String[] words = line.split(",");
    if(words[3].equals("40")){  
       saleValue.set(Integer.parseInt(words[0]));
       rangeValue.set(words[3]);
       con.write(rangeValue , saleValue );
    }
}   
}

在reducer中,正如@Serhiy在原始问题中所建议的那样,你只需要一行代码:

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>  
{  
private IntWritable result = new IntWritable();  
public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException  
{  
    con.write(word, null);  
} 

升级&#34;编辑1&#34; - 我会留下一个微不足道的做法:)

答案 2 :(得分:0)

Mapper输出将是这样的:

<word,count>

Reducer输出如下:

<unique word, its total count>

例如:读取一行,并计算其中的所有单词并将其放入<key,value>对中:

<40,1>
<140,1>
<50,1>
<40,1> ..

这里40,50,140,..都是键,值是一行中该键出现次数的计数。这发生在映射器中。

然后,将这些key,value对发送到reducer,其中类似的键全部缩减为单个key,并且将与该键关联的所有值相加以给出键值的值对。因此,reducer的结果将是:

<40,10>
<50,5>
...

在你的情况下,减速器没有做任何事情。 mapper找到的唯一值/单词只是作为输出给出。

理想情况下,你应该减少&amp;获得如下输出:&#34; 40,150&#34;在同一条线上被发现了5次。