我正在为减少孩子获得OOM异常(Java堆空间)。在reducer中,我将所有值附加到StringBuilder,这将是reducer进程的输出。值的数量并不多。我试图将mapred.reduce.child.java.opts
的值增加到512M和1024M,但这没有帮助。减速机代码如下。
StringBuilder adjVertexStr = new StringBuilder();
long itcount= 0;
while(values.hasNext()) {
adjVertexStr.append(values.next().toString()).append(" ");
itcount++;
}
log.info("Size of iterator: " + itcount);
multipleOutputs.getCollector("vertex", reporter).collect(key, new Text(""));
multipleOutputs.getCollector("adjvertex", reporter).collect(adjVertexStr, new Text(""));
我在上面的代码中的3个地方获得了例外。
迭代器的一些样本大小如下:238695,1,13,673,1,1等。这些值不是很大。为什么我一直得到OOM异常?任何帮助都对我有价值。
堆栈跟踪
2012-10-10 21:15:03,929 INFO partitioning.UndirectedGraphPartitioner: Size of iterator: 238695
2012-10-10 21:15:04,190 INFO partitioning.UndirectedGraphPartitioner: Size of iterator: 1
2012-10-10 21:15:04,190 INFO partitioning.UndirectedGraphPartitioner: Size of iterator: 1
2012-10-10 21:15:04,190 INFO partitioning.UndirectedGraphPartitioner: Size of iterator: 13
2012-10-10 21:15:04,190 INFO partitioning.UndirectedGraphPartitioner: Size of iterator: 1
2012-10-10 21:15:04,191 INFO partitioning.UndirectedGraphPartitioner: Size of iterator: 1
2012-10-10 21:15:04,193 INFO partitioning.UndirectedGraphPartitioner: Size of iterator: 673
2012-10-10 21:15:04,195 INFO partitioning.UndirectedGraphPartitioner: Size of iterator: 1
2012-10-10 21:15:04,196 INFO partitioning.UndirectedGraphPartitioner: Size of iterator: 1
2012-10-10 21:15:04,196 INFO partitioning.UndirectedGraphPartitioner: Size of iterator: 1
2012-10-10 21:15:04,196 INFO partitioning.UndirectedGraphPartitioner: Size of iterator: 1
2012-10-10 21:15:04,196 INFO partitioning.UndirectedGraphPartitioner: Size of iterator: 1
2012-10-10 21:15:09,856 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs` truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-10-10 21:15:09,916 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
2012-10-10 21:15:09,916 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName hduser for UID 2006 from the native implementation
2012-10-10 21:15:09,922 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuilder.append(StringBuilder.java:119)
at partitioning.UndirectedGraphPartitioner$Reduce.reduce(UndirectedGraphPartitioner.java:106)
at partitioning.UndirectedGraphPartitioner$Reduce.reduce(UndirectedGraphPartitioner.java:82)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
答案 0 :(得分:2)
因此,对于您的示例,您希望将特定键的值输出为空格分隔的值列表(作为输出键),并将空文本作为输出值输出。
您的输出格式将使用reduce键/值,如下所示(这将在您的reducer代码中):
for (Text value : values) {
multipleOutputs.getCollector("adjvertex", reporter)
.collect(key, value);
}
然后,实际的recordWriter将使用该键作为逻辑触发器:
当传递的密钥与先前传递的密钥不同时,正在写入的上一条记录将被关闭(例如,写一个选项卡后跟一个换行符)。将更新上一个密钥,并将新值写入输出流。
如果密钥与上一个密钥相同,则输出一个空格,后跟输出流的值。
在记录编写器的close方法中,执行与传递新密钥相同的逻辑(输出选项卡,后跟换行符)。
希望这是有道理的。您唯一需要注意的是,如果您有自定义组比较器(这将导致记录编写器中的先前键比较失败)。还要记得在更新上一个密钥跟踪变量时对密钥进行深层复制。