reducer(带有Text键和Iterable MapWritable值)如何将其所有Map输出到序列文件,以便保留其键的分组?例如,假设映射器将记录发送到看起来像的缩减器:
<"dog", {<"name", "Fido">, <"pure bred?", "false">, <"type", "mutt">}>
<"cat", {<"name", "Felix">, <"color", "black">, <"origin", "film">, <"date", "1919">}>
<"dog", {<"name", "Lassie">, <"type", "collie">, <"origin", " short story">}>
我希望将序列文件写成:
key = "dog"
value = {
{<"name", "Fido">, <"pure bred?", "false">, <"type", "mutt">},
{<"name", "Lassie">, <"type", "collie">, <"origin", "short story">}
}
key = "cat"
value = {
{<"name", "Felix">, <"color", "black">, <"origin", "film">, <"date", "1919">}
}
我猜我需要创建一个实现Writable的自定义值输出类,但我不知道如何做到这一点,因为据我所知,Collections实际上并不适用于序列文件。我想这样做,以便下一个map / reduce阶段将在与每个键相关联的所有地图中作为一个单元读取。
TIA,
答案 0 :(得分:0)
如您所知,您可以创建一个扩展ArrayWritable的自定义Writable:
public class MapWritableArray extends ArrayWritable {
public MapWritableArray() {
super(MapWritable.class);
}
}
然后在您的reducer中,您需要将MapWritable值的迭代值累积到一个数组中(记住在每次迭代时随着底层内容的变化复制值)。类似的东西(完全未经测试,未经过编译验证且未经过优化):
MapWritableArray mapWritableArray = new MapWritableArray();
ArrayList<MapWritable> valList = new ArrayList<MapWritable>();
for (MapWritable value : values) {
MapWritable copy = ReflectionUtils.newInstance(context.getConfiguration(), MapWritable.class);
ReflectionUtils.copy(context.getConfiguration, value, copy);
valList.add(copy);
}
mapWritableArray.set(valList.toArray(new MapWritable[0]));