如何在mapreduce中找到他们的电影名称最受欢迎的十大电影

时间:2018-05-31 07:54:56

标签: java mapreduce bigdata hadoop2

我是一个电影数据集和像这样的评级数据集

movies.txt           MovieID – Title – Genres
ratings.txt          UserID – MovieID – Rating – Timestamp

我正在尝试写一份MR作业,它会找到名列前10名的电影。我写了一份给出电影名称和视图的作业。就像这个

 American Beauty (1999)::3428
 Star Wars: Episode IV – A New Hope (1977)::2991
 Star Wars: Episode V – The Empire Strikes Back (1980)::2990
 Star Wars: Episode VI – Return of the Jedi (1983)::2883
 ........................................................

我发布了用于开发工作的代码,

电影资料的地图

  import java.io.IOException;
  import org.apache.hadoop.io.Text;
  import org.apache.hadoop.mapreduce.Mapper;

  public class MoviesDataMapper extends Mapper<Object, Text, Text, Text> {
  private Text movieId = new Text();
  private Text outvalue = new Text();
  @Override
   public void map(Object key, Text values, Context context) throws 
 IOException, InterruptedException {
   String data = values.toString();
    String[] field = data.split("::", -1);
  if (null != field && field.length == 3 && field[0].length() > 0) {
  movieId.set(field[0]);
  outvalue.set("M" + field[1]);
  context.write(movieId, outvalue);
  }
 }

评级的地图

    import java.io.IOException;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;
     public class RatingDataMapper extends Mapper<Object, Text, Text, Text> {
     private Text movieId = new Text();
     private Text outvalue = new Text();
     @Override
    public void map(Object key, Text values, Context context) throws 
        IOException, InterruptedException {
      String data = values.toString();
      String[] field = data.split("::", -1);
       if (null != field && field.length == 4 && field[0].length() > 0) {
       movieId.set(field[1]);
       outvalue.set("R" + field[2]);
       context.write(movieId, outvalue);
   }
 }

Reducer代码

    public class MoviesRatingJoinReducer extends Reducer<Text, Text, Text, 
    Text> {
    private ArrayList<Text> listMovies = new ArrayList<Text>();
    private ArrayList<Text> listRating = new ArrayList<Text>();
    @Override
    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
    listMovies.clear();
    listRating.clear();
    for (Text text : values) {
    if (text.charAt(0) == 'M') {
    listMovies.add(new Text(text.toString().substring(1)));
   } else if (text.charAt(0) == 'R') {
  listRating.add(new Text(text.toString().substring(1)));
   }
 }
     executeJoinLogic(context);
  }
  private void executeJoinLogic(Context context) throws IOException, 
   InterruptedException {
    if (!listMovies.isEmpty() && !listRating.isEmpty()) {
     for (Text moviesData : listMovies) {
    context.write(moviesData, new Text(String.valueOf(listRating.size())));
    }
  }
}
  }

我无法通过逻辑来找到评价最高的前10部电影。请帮忙。

0 个答案:

没有答案