Question

在我的一个类中使用HashMap.Im在我的mapper中调用该类。所以现在每个映射器都有自己的HashMap。现在我可以将所有HashMaps用于单个减速器吗？实际上我的HashMap包含Key作为我的文件名，值是Set.So每个HashMap包含一个文件名和一个Set。现在我想使用所有带有相同文件名的HashMap，并希望将所有值（集合）加入，然后将HashMap写入我的Hdfs文件

Answer 1

是的，你可以这样做。如果您的映射器以hashmap的形式提供输出，那么您可以使用Hadoop的MapWritable作为mapper的值。例如，

public class MyMapper extends Mapper<LongWritable, Text, Text, MapWritable>

您必须将Hashmap转换为MapWritable格式：

MapWritable mapWritable = new MapWritable();
for (Map.Entry<String,String> entry : yourHashMap.entrySet()) {
    if(null != entry.getKey() && null != entry.getValue()){
       mapWritable.put(new Text(entry.getKey()),new Text(entry.getValue()));
    }
}

然后提供可映射到您的上下文：

ctx.write(new Text("my_key",mapWritable);

对于Reducer类，您已将MapWritable作为输入值

public class MyReducer extends Reducer<Text, MapWritable, Text, Text>

public void reduce(Text key, Iterable<MapWritable> values, Context ctx) throws IOException, InterruptedException

然后迭代地图并按照您想要的方式提取值。例如：

for (MapWritable entry : values) {
  for (Entry<Writable, Writable> extractData: entry.entrySet()) {
      //your logic for the data will go here.
   }                    
}

每个映射器中的Hashmap应该在单个reducer中使用

1 个答案: