我创建了一个map方法,用于读取wordcount示例[1]的地图输出。这个例子不再使用MapReduce提供的IdentityMapper
,但这是我找到为Wordcount创建工作WordCountIdentityMapper
的唯一方法。唯一的问题是这个Mapper花费的时间比我想要的多得多。我开始认为也许我正在做一些多余的事情。有任何帮助来改进我的public class WordCountIdentityMapper extends MyMapper<LongWritable, Text, Text, IntWritable> {
private Text word = new Text();
public void map(LongWritable key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
word.set(itr.nextToken());
Integer val = Integer.valueOf(itr.nextToken());
context.write(word, new IntWritable(val));
}
public void run(Context context) throws IOException, InterruptedException {
while (context.nextKeyValue()) {
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
}
}
代码吗?
[1]身份映射器
public static class MyMap extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
public void run(Context context) throws IOException, InterruptedException {
try {
while (context.nextKeyValue()) {
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
} finally {
cleanup(context);
}
}
}
[2]生成mapoutput
的Map类def my_helper_method(string)
text, link = string.partition(/(w{3}.youtube.com\/watch\?v=\w*)/)
output = ""
output += "<p>#{text}</p>" unless text.empty?
output += "<iframe src='#{link}</iframe>'" unless link.empty?
output
end
谢谢,
答案 0 :(得分:0)
解决方案是用StringTokenizer
方法替换indexOf()
。它效果更好。我的表现更好。