reducer中的MapReduce值始终为1

时间:2016-12-30 13:58:35

标签: hadoop mapreduce cloudera

我正在使用Cloudera来实现mapreduce作业。我的输入是json,看起来像这样:

preg_match_all('%<([A-Za-z]+)>([.\n]+)</\1>%', $text);

我的映射器从&#34; asin&#34;中选择值。和&#34; reviewText&#34;这个json:

{"reviewerID": "A2PUSR7ROG0Z6T", "asin": "9742356831", "reviewerName": "Terry Bisgrove \"Mr.E.Man\"", "helpful": [2, 2], "reviewText": "I like other styles of Mae Ploy curry paste, but the green just doesn't work for me. Overwhelming garlic, no heat, and very bland. I would not purchase this product again.", "overall": 3.0, "summary": "OK Product", "unixReviewTime": 1344297600, "reviewTime": "08 7, 2012"}
{"reviewerID": "A2ANBEX40KLY4O", "asin": "9742356831", "reviewerName": "TrishS \"TrishS\"", "helpful": [3, 4], "reviewText": "I have both the red and green curry paste.  The green is milder.  I use both of them in variety of dishes and often spice up soups and stews that need a little zing.  It is so convient to have them in the frig.", "overall": 5.0, "summary": "Tasty and fast", "unixReviewTime": 1310601600, "reviewTime": "07 14, 2011"}
{"reviewerID": "A1C8NAHYR6Z10F", "asin": "B00004S1C5", "reviewerName": "A. Horikawa", "helpful": [1, 2], "reviewText": "These dyes create awesome colors for kids crafts. I have used them to make finger paint, paint, play dough, and salt dough.Another reviewer stated that they are not natural - this is CORRECT. They are definitely artificial dyes. I tried making my own dyes, and when that fell through, these worked great in a pinch. You only need a couple drops for really vibrant color. And they are pretty easy to clean - don't stain after they've been made into whatever craft.Good product for the price!", "overall": 5.0, "summary": "Great for kids crafts!", "unixReviewTime": 1344297600, "reviewTime": "08 7, 2012"}
{"reviewerID": "A14YSMLYLJEMET", "asin": "B00004S1C5", "reviewerName": "Amazon Customer", "helpful": [8, 11], "reviewText": "This product is no where near natural / organic-I only wish I had seen the other reviews before purchasing! It contains all the things I did not want-which is why I was looking for a natural alternative. They need to have an ingredient list on here to avoid this...I am &#34;returning&#34; item. I am trying to avoid my children's exposure to yellow 5, red 40 and so on...I do not understand how they can still make these things knowing what they can cause. This may be fine for someone that doesn't read labels or care what their kids eat-but not for my family.", "overall": 1.0, "summary": "Not natural/organic at all", "unixReviewTime": 1364515200, "reviewTime": "03 29, 2013"}
...

最后,我的reducer遍历所有值并为每个键写入值列表的大小:

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.json.JSONObject;

public class SentimentMapper extends Mapper<LongWritable, Text, Text, Text> {

  @Override
  public void map(LongWritable key, Text value, Context context)
      throws IOException, InterruptedException {

      JSONObject obj = new JSONObject(value.toString());
      context.write(new Text(obj.getString("asin")), new Text(obj.getString("reviewText")));

  }

}

不幸的是我最终得到了这个结果:

import java.io.IOException;
import java.util.ArrayList;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class SentimentReducer extends Reducer<Text, Text, Text, Text> {

  @Override
  public void reduce(Text key, Iterable<Text> values, Context context)
      throws IOException, InterruptedException {

      ArrayList<String> list = new ArrayList<String>();
      for(Text val : values) {
          list.add(new String(val.toString()));
      }

      context.write(key, new Text(String.valueOf(list.size())));

  }
}

这意味着所有键的大小始终为1.正如您在输入json中所看到的,对于某些键(例如B00004S1C5),应该有多个值。有人可以帮我解决这个问题吗?

更新:这是所要求的驱动程序类:

616719923X  1
9742356831  1
B00004S1C5  1
B0000531B7  1
B00005344V  1
B0000537AF  1
B00005C2M2  1
B00006IUTN  1
B0000CCZYY  1
B0000CD06J  1
B0000CDBQN  1
B0000CDEPD  1
B0000CETGM  1
B0000CFLCT  1
B0000CFLIL  1

不确定这是否相关,但我将其导出为可运行的JAR文件并从命令行调用它。

1 个答案:

答案 0 :(得分:1)

<强>更新 你不需要在你的程序中使用组合器,请在驱动程序类中删除或评论组合器,这应该修复你的程序!....

组合器将输入发送到reducer:

9742356831 ----- 2
B00004S1C5 ----- 2

因此减速机输出:

9742356831 ----- 1
B00004S1C5 ----- 1

我使用outiner 测试了您的代码,并给了我预期的结果,但我会将您的程序重新编写为:

<强>输出:

9742356831  2
B00004S1C5  2
public static class jsonDataMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

        public void map(LongWritable key, Text value, Context context)
              throws IOException, InterruptedException {

            JSONObject obj;
            try {
                obj = new JSONObject(value.toString());
                //context.write(new Text(obj.getString("asin")), new Text(obj.getString("reviewText")));
                context.write(new Text(obj.getString("asin")), new IntWritable(1));
            } catch (JSONException e) {                 
                e.printStackTrace();
            }      
          }
    }

    public  static class jsonDataReducer extends Reducer<Text, IntWritable, Text, Text> {

         public void reduce(Text key, Iterable<IntWritable> values, Context context)
                  throws IOException, InterruptedException {
//                ArrayList<String> list = new ArrayList<String>();
//                for(Text val : values) {
//                    list.add(new String(val.toString()));
//                }
//
//                context.write(key, new Text(String.valueOf(list.size())));

             int sum=0;
             for(IntWritable i: values)
                  sum+=i.get();
             context.write(key, new Text(String.valueOf(sum)));
            }
    }