Question

这是Extracting rows containing specific value using mapReduce and hadoop的后续问题 映射器功能

public static class MapForWordCount extends Mapper<Object, Text, Text, IntWritable>{

private IntWritable saleValue = new IntWritable();
private Text rangeValue = new Text();

public void map(Object key, Text value, Context con) throws IOException, InterruptedException
{
    String line = value.toString();
    String[] words = line.split(",");
    for(String word: words )
    {
        if(words[3].equals("40")){  
            saleValue.set(Integer.parseInt(words[0]));
            rangeValue.set(words[3]);
            con.write( rangeValue , saleValue );
        }
    }
}   
}

减速机功能

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>  
{  
    private IntWritable result = new IntWritable();  
    public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException  
    {  
        for(IntWritable value : values)  
        {  
            result.set(value.get());  
            con.write(word, result);  
        }  
    }  
}

获得的输出

编辑1： 但是预期的输出是

40 102  
40 104  
40 105

我做错了什么？

mapper和reducer函数到底发生了什么？

Answer 1

究竟发生了什么

您正在使用逗号分隔文本行，分割逗号并过滤掉一些值。如果你所做的只是提取那些值，那么每行只应调用Rebalance(Array S, Integer iniLen, Integer finLen) k = finLen-1 step = finLen/iniLen for j=iniLen-1 to 0: S[k] = S[j] S[j] = NONE k = k-step end for LibrarySort(Array A, Integer n, Float epsilon, Array S) goal = 1 pos = 0 sLen = (Integer)(1+epsilon)*n while pos<n://For each round do this: for i=1 to goal://Insert 'goal' elements to the sorted array S //Search a position to insert A[pos] insPos = binarySearch(A[pos], S, sLen) if not IS_EMPTY(S[insPos]): //Move elements to the right or the left in order to free //insPos freeSpace(insPos, S, sLen) end if S[insPos] = A[pos]//Insert new element pos = pos + 1 if pos>n://All elements have been inserted return LibrarySort end if end for prevLen = sLen sLen = min( (2+2*epsilon)*goal, (1+epsilon)*n ) //Rebalance array S Rebalance(S, prevLen, sLen) goal = goal * 2 end while一次。

映射器将对所有＆＃34; 40＆＃34;您输出的键，并形成使用该键写入的所有值的列表。这就是减速器正在阅读的内容。

您应该尝试使用此地图功能。

con.write()

如果您不想要分割字符串长度的重复值，那么摆脱for循环。

你所做的所有减速器只是打印出从映射器中收到的内容。

Answer 2

在original question的上下文中 - 由于您要复制条目，因此您不需要不在映射器或reducer中的循环：

public static class MapForWordCount extends Mapper<Object, Text, Text, IntWritable>{

private IntWritable saleValue = new IntWritable();
private Text rangeValue = new Text();

public void map(Object key, Text value, Context con) throws IOException, InterruptedException
{
    String line = value.toString();
    String[] words = line.split(",");
    if(words[3].equals("40")){  
       saleValue.set(Integer.parseInt(words[0]));
       rangeValue.set(words[3]);
       con.write(rangeValue , saleValue );
    }
}   
}

在reducer中，正如@Serhiy在原始问题中所建议的那样，你只需要一行代码：

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>  
{  
private IntWritable result = new IntWritable();  
public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException  
{  
    con.write(word, null);  
}

升级＆＃34;编辑1＆＃34; - 我会留下一个微不足道的做法：）

Answer 3

Mapper输出将是这样的：

<word,count>

Reducer输出如下：

<unique word, its total count>

例如：读取一行，并计算其中的所有单词并将其放入<key,value>对中：

<40,1>
<140,1>
<50,1>
<40,1> ..

这里40,50,140，..都是键，值是一行中该键出现次数的计数。这发生在映射器中。

然后，将这些key,value对发送到reducer，其中类似的键全部缩减为单个key，并且将与该键关联的所有值相加以给出键值的值对。因此，reducer的结果将是：

<40,10>
<50,5>
...

在你的情况下，减速器没有做任何事情。 mapper找到的唯一值/单词只是作为输出给出。

理想情况下，你应该减少＆amp;获得如下输出：＆＃34; 40,150＆＃34;在同一条线上被发现了5次。

究竟什么是mapper和reducer函数的输出

3 个答案: