我有两个表格
的文件文件1 :
key1 value1
key2 value2
...
文件2 :
key1 value3
key2 value4
...
我想生成一个
形式的reduce输出key1(value1-value3)/ value1
key2(value2-value4)/ value2
我让地图写下了密钥,并且值前面加上一个告诉它的字符 来自file1或file2,但不确定如何编写reduce阶段
我的地图方法是
public void map(LongWritable key,Text val,Context context) throws IOException, InterruptedException
{
Text outputKey = new Text();
Text outputValue = new Text();
outputKey.set(key.toString());
if ("A")
{
outputValue.set("A,"+val);
}
else
{
outputValue.set("B," + val);
}
context.write(outputKey, outputValue);
}
}
答案 0 :(得分:1)
它应该很简单,因为你已经标记了它,虽然开始时有点混乱。我假设发出的值类似于A23
(对于file1)& B139
(对于file2)。片段:
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
int diff = 0;
int denominator = 1;
for (Text val : values) {
if (val.toString().startsWith("A")) {
denominator = Integer.parseInt(val.toString().substring(1));
diff += denominator;
} else if (val.toString().startsWith("B")) {
diff -= Integer.parseInt(val.toString().substring(1));
} else {
// This block shouldn't be reached unless malformed values are emitted
// Throw an exception or log it
}
}
diff /= denominator;
context.write(key, new IntWritable(diff));
}
希望这会有所帮助。但我认为,当key1
和key2
相等时,您的方法会失败。
<强>更新强>
map
应如下所示,以使用上面的reducer:
public void map(LongWritable key, Text val, Context context)
throws IOException, InterruptedException {
String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
String[] keyVal = val.toString().split("\\s+");
Text outputKey = new Text(keyVal[0]);
Text outputValue = new Text();
outputKey.set(key.toString());
if ("fileA".equals(fileName)) {
outputValue.set("A" + keyVal[1]);
} else {
outputValue.set("B" + keyVal[1]);
}
context.write(outputKey, outputValue);
}
答案 1 :(得分:0)
我发现在这种情况下使用NamedVector非常有帮助。这提供了值的标识,以便您可以根据“名称”对值执行必需的操作。