我有一个文本输入文件,包含一个URL +一个变量ammount的关键字。这看起来像是:
我需要将其转换为输出,例如:
我的mapper类看起来像这样:
public class KeywordsMapper extends Mapper<LongWritable, Text, Text, Text> {
private Text urlkey = new Text();
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] line = value.toString().split(" ");
ArrayList<String> keywords = new ArrayList<String>();
for (String sequence : line) {
if (sequence.endsWith(".com")) {
// url
urlkey.set(sequence);
} else {
// keyword
keywords.add(sequence);
}
}
for (String keyword : keywords) {
context.write(new Text(keyword), urlkey);
}
}
}
我的reducer / combiner类看起来像这样:
public class KeywordReducer extends Reducer<Text, Iterable<Text>, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
String body = "";
for(Text part : values){
body = body + " " + part.toString() + " ";
}
context.write(key, new Text(body));
}
}
这份工作看起来像这样:
public class KeywordJob extends Configured implements Tool{
@Override
public int run(String[] arg0) throws Exception {
Job job = new Job(getConf());
job.setJarByClass(getClass());
job.setJobName(getClass().getSimpleName());
FileInputFormat.addInputPath(job, new Path(arg0[0]));
FileOutputFormat.setOutputPath(job, new Path(arg0[1]));
job.setMapperClass(KeywordsMapper.class);
job.setCombinerClass(KeywordReducer.class);
job.setReducerClass(KeywordReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[]args) throws Exception{
int rc = ToolRunner.run(new KeywordJob(), args);
System.exit(rc);
}
}
我目前获得的输出是:
输入文件为:
yahoo.com news sports finance email celebrity
amazon.com shoes books jeans
google.com news finance email search
microsoft.com operating-system productivity search
target.com shoes books jeans groceries
wegmans.com books groceries
facebook.com news social sports
linkedin.com news recruitment
问题:如何调整我的合成器/减速器以获得所需的输出?是否有一个特定的原因,为什么输出包含多个重复键,以及它们未被合并的结果?
答案 0 :(得分:2)
标记,
没有调用/调用reducer。
reducer类定义应该看起来像 -
onPause
而不是
public class KeywordReducer extends Reducer<Text, Text, Text, Text>
因为地图输出应与此对应。 reduce()方法签名是正确的。
希望这有帮助。