我正在尝试使用Hadoop从庞大的数据集中查找十大电影。我正在使用Map Reduce方法。我已经使用本地集合即TreeMap对数据进行排序,但不建议使用此方法。我可以知道在Mapper中处理大量数据时对数据进行排序的正确方法吗?我正在提供我的Mapper和Reducer代码
映射器代码
public class HighestViewedMoviesMapper extends Mapper<Object, Text, NullWritable, Text> {
private TreeMap<Integer, Text> highestView = new TreeMap<Integer, Text>();
@Override
public void map( Object key, Text values, Context context ) throws IOException, InterruptedException {
String data = values.toString();
String[] field = data.split( "::", -1 );
if ( null != field && field.length == 2 ) {
int views = Integer.parseInt( field[1] );
highestView.put( views, new Text( field[0] + "::" + field[1] ) );
if ( highestView.size() > 10 ) {
highestView.remove( highestView.firstKey() );
}
}
}
@Override
protected void cleanup( Context context ) throws IOException, InterruptedException {
for ( Map.Entry<Integer, Text> entry : highestView.entrySet() ) {
context.write( NullWritable.get(), entry.getValue() );
}
}
}
减速器代码
public class HighestViewMoviesReducer extends Reducer<NullWritable, Text, NullWritable, Text> {
private TreeMap<Integer, Text> highestView = new TreeMap<Integer, Text>();
public void reduce( NullWritable key, Iterable<Text> values, Context context )
throws IOException, InterruptedException {
for ( Text value : values ) {
String data = value.toString();
String[] field = data.split( "::", -1 );
if ( field.length == 2 ) {
highestView.put( Integer.parseInt( field[1] ), new Text( value ) );
if ( highestView.size() > 10 ) {
highestView.remove( highestView.firstKey() );
}
}
}
for ( Text t : highestView.descendingMap().values() ) {
context.write( NullWritable.get(), t );
}
}
}
有人可以告诉我这样做的最佳方法吗?预先感谢。