我是一个电影数据集和像这样的评级数据集
movies.txt MovieID – Title – Genres
ratings.txt UserID – MovieID – Rating – Timestamp
我正在尝试写一份MR作业,它会找到名列前10名的电影。我写了一份给出电影名称和视图的作业。就像这个
American Beauty (1999)::3428
Star Wars: Episode IV – A New Hope (1977)::2991
Star Wars: Episode V – The Empire Strikes Back (1980)::2990
Star Wars: Episode VI – Return of the Jedi (1983)::2883
........................................................
我发布了用于开发工作的代码,
电影资料的地图
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MoviesDataMapper extends Mapper<Object, Text, Text, Text> {
private Text movieId = new Text();
private Text outvalue = new Text();
@Override
public void map(Object key, Text values, Context context) throws
IOException, InterruptedException {
String data = values.toString();
String[] field = data.split("::", -1);
if (null != field && field.length == 3 && field[0].length() > 0) {
movieId.set(field[0]);
outvalue.set("M" + field[1]);
context.write(movieId, outvalue);
}
}
评级的地图
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class RatingDataMapper extends Mapper<Object, Text, Text, Text> {
private Text movieId = new Text();
private Text outvalue = new Text();
@Override
public void map(Object key, Text values, Context context) throws
IOException, InterruptedException {
String data = values.toString();
String[] field = data.split("::", -1);
if (null != field && field.length == 4 && field[0].length() > 0) {
movieId.set(field[1]);
outvalue.set("R" + field[2]);
context.write(movieId, outvalue);
}
}
Reducer代码
public class MoviesRatingJoinReducer extends Reducer<Text, Text, Text,
Text> {
private ArrayList<Text> listMovies = new ArrayList<Text>();
private ArrayList<Text> listRating = new ArrayList<Text>();
@Override
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
listMovies.clear();
listRating.clear();
for (Text text : values) {
if (text.charAt(0) == 'M') {
listMovies.add(new Text(text.toString().substring(1)));
} else if (text.charAt(0) == 'R') {
listRating.add(new Text(text.toString().substring(1)));
}
}
executeJoinLogic(context);
}
private void executeJoinLogic(Context context) throws IOException,
InterruptedException {
if (!listMovies.isEmpty() && !listRating.isEmpty()) {
for (Text moviesData : listMovies) {
context.write(moviesData, new Text(String.valueOf(listRating.size())));
}
}
}
}
我无法通过逻辑来找到评价最高的前10部电影。请帮忙。