我对Hadoop很陌生,我已经苦苦挣扎了两天,弄清楚为什么output.collect没有收集正确的价值。
我自己解释一下:事实上,(为了简化)我有以下地图方法:
public class MyObject {
private boolean original;
private boolean split;
....
}
}
其中MyObject是我创建的对象:
WITH myCTE AS
(
SELECT Name,Postcode FROM #People
WHERE Postcode NOT IN (SELECT Postcode FROM #PostcodesToIgnore)
)
SELECT Postcode, Count(Name)
FROM myCTE
GROUP BY Postcode
实际上,当我在调试模式下仅启动映射器时,即使我将row(MyObject)的origin属性设置为true,mapper(output.collect)的输出也始终为row origin属性设置为false(布尔值的默认值)。我不明白output.collect有什么问题。
任何帮助都会受到欢迎。谢谢!
答案 0 :(得分:1)
谢谢你的回答马特!实际上,问题来自于readFields的实现和写入,因为我没有调用:
//write
_original.write(out);
_split.write(out);
//readFields
_original = new BooleanWritable();
_split = new BooleanWritable();
_original.readFields(in);
_split.readFields(in);