Question

下面是我使用自定义可写可比较的实现简单MapReduce作业的代码。

public class MapReduceKMeans {

public static class MapReduceKMeansMapper extends
        Mapper<Object, Text, SongDataPoint, Text> {
    public void map(Object key, Text value, Context context)
            throws InterruptedException, IOException {
        String str = value.toString();
        // Reading Line one by one from the input CSV.
        String split[] = str.split(",");

        String trackId = split[0];
        String title = split[1];
        String artistName = split[2];
        SongDataPoint songDataPoint = 
                new SongDataPoint(new Text(trackId), new Text(title), 
                        new Text(artistName));
        context.write(songDataPoint, new Text());
        }
    }


public static class MapReduceKMeansReducer extends
Reducer<SongDataPoint, Text, Text, NullWritable> {
    public void reduce(SongDataPoint key, Iterable<Text> values,
            Context context) throws IOException, InterruptedException {
        StringBuilder sb = new StringBuilder();
        sb.append(key.getTrackId()).append("\t").
        append(key.getTitle()).append("\t")
        .append(key.getArtistName()).append("\t");

        String write = sb.toString();

        context.write(new Text(write), NullWritable.get());
    }

}

public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args)
            .getRemainingArgs();

    if (otherArgs.length != 2) {
        System.err
                .println("Usage:<CsV Out Path> <Final Out Path>");
        System.exit(2);
    }



    Job job = new Job(conf, "Song Data Trial");
    job.setJarByClass(MapReduceKMeans.class);
    job.setMapperClass(MapReduceKMeansMapper.class);
    job.setReducerClass(MapReduceKMeansReducer.class);
    job.setOutputKeyClass(SongDataPoint.class);
    job.setOutputValueClass(Text.class);

    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

当我调试代码时，会读取CSV文件中的所有行，但它根本不会进入reduce作业。

我也使用SongDataPoint作为我的自定义可写。

其代码如下。

public class SongDataPoint implements WritableComparable<SongDataPoint> {

Text trackId;
Text title;
Text artistName;

public SongDataPoint() {
    this.trackId = new Text();
    this.title = new Text();
    this.artistName = new Text();
}

public SongDataPoint(Text trackId, Text title, Text artistName) {
    this.trackId = trackId;
    this.title = title;
    this.artistName = artistName;
}

@Override
public void readFields(DataInput in) throws IOException {
    this.trackId.readFields(in);
    this.title.readFields(in);
    this.artistName.readFields(in);
}

@Override
public void write(DataOutput out) throws IOException {

}

public Text getTrackId() {
    return trackId;
}

public void setTrackId(Text trackId) {
    this.trackId = trackId;
}

public Text getTitle() {
    return title;
}

public void setTitle(Text title) {
    this.title = title;
}

public Text getArtistName() {
    return artistName;
}

public void setArtistName(Text artistName) {
    this.artistName = artistName;
}


@Override
public int compareTo(SongDataPoint o) {
    // TODO Auto-generated method stub
    int compare = getTrackId().compareTo(o.getTrackId());
    return compare;
}

}

感谢任何帮助。感谢。

Answer 1

根据Driver的输出键类类是SongDataPoint.class，输出值类是Text.class，但实际上你在Reducer中将Text作为键，在Reducer中写为Nullwritable。

Answer 2

您还应该指定Mapper输出值，如下所示。

job.setMapOutputKeyClass(SongDataPoint.class);
job.setMapOutputValueClass(Text.class);

Answer 3

我的CustomWritable类中的写入方法错误地留空了。它在写完正确的代码后解决了这个问题。

public void write(DataOutput out) throws IOException {

}

地图完成后，Reduce无法启动

3 个答案: