我在群集上运行MR任务。 (hadoop 1.2.1)
我的MR应用程序首先在第一个Map / Reduce阶段将输入数据拆分为多个分区(128 ^ 2~512 ^ 2),然后在第二个Map阶段处理每个分区。
由于处理每个分区需要相当大的内存,我增加了分区数(128 ^ 5~512 ^ 2)。现在我遇到以下错误消息:
Error #1
Job initialization failed: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.TreeSet.<init>(TreeSet.java:124) at
org.apache.hadoop.mapred.TaskInProgress.<init>(TaskInProgress.java:105) at
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:745) at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3890) at
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Error #2
Failure Info:Job initialization failed: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:3664) at java.lang.StringBuffer.toString(StringBuffer.java:671) at
org.apache.hadoop.fs.Path.toString(Path.java:252) at
org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:75) at
org.apache.hadoop.mapred.JobInProgress.createSplits(JobInProgress.java:834) at
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:724) at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3890) at
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
它说我需要增加每个map / reduce java worker的内存量。
我无法理解上述错误的根本原因,因为OOM错误消息不是来自我的应用程序代码。在我看来,它来自内部引擎源代码。
是否有错误来自哪个? 感谢