我正在尝试在大文本数据集(~3.1Tb)上运行Hadoop程序。
我一直在收到此错误,但我看不到任何日志:
15/04/29 13:31:30 INFO mapreduce.Job: map 86% reduce 3%
15/04/29 13:33:33 INFO mapreduce.Job: map 87% reduce 3%
15/04/29 13:35:34 INFO mapreduce.Job: map 88% reduce 3%
15/04/29 13:37:34 INFO mapreduce.Job: map 89% reduce 3%
15/04/29 13:39:33 INFO mapreduce.Job: map 90% reduce 3%
15/04/29 13:41:27 INFO mapreduce.Job: map 91% reduce 3%
15/04/29 13:42:51 INFO mapreduce.Job: Task Id : attempt_1430221604005_0004_m_018721_0, Status : FAILED
Error: Java heap space
15/04/29 13:43:03 INFO mapreduce.Job: Task Id : attempt_1430221604005_0004_m_018721_1, Status : FAILED
Error: Java heap space
15/04/29 13:43:21 INFO mapreduce.Job: Task Id : attempt_1430221604005_0004_m_018721_2, Status : FAILED
Error: Java heap space
15/04/29 13:43:23 INFO mapreduce.Job: map 92% reduce 3%
15/04/29 13:43:53 INFO mapreduce.Job: map 100% reduce 100%
15/04/29 13:44:00 INFO mapreduce.Job: Job job_1430221604005_0004 failed with state FAILED due to: Task failed task_1430221604005_0004_m_018721
Job failed as tasks failed. failedMaps:1 failedReduces:0
15/04/29 13:44:00 INFO mapreduce.Job: Counters: 40
File System Counters
FILE: Number of bytes read=1671885418232
FILE: Number of bytes written=3434806868906
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2421645776312
HDFS: Number of bytes written=0
HDFS: Number of read operations=54123
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Failed map tasks=4
Killed map tasks=53
Killed reduce tasks=13
Launched map tasks=18098
Launched reduce tasks=13
Other local map tasks=3
Data-local map tasks=18095
Total time spent by all maps in occupied slots (ms)=833322750
Total time spent by all reduces in occupied slots (ms)=179324736
Total time spent by all map tasks (ms)=833322750
Total time spent by all reduce tasks (ms)=44831184
Total vcore-seconds taken by all map tasks=833322750
Total vcore-seconds taken by all reduce tasks=44831184
Total megabyte-seconds taken by all map tasks=1644979108500
Total megabyte-seconds taken by all reduce tasks=353987028864
Map-Reduce Framework
Map input records=4341029640
Map output records=3718782624
Map output bytes=1756332044946
Map output materialized bytes=1769982618200
Input split bytes=2694367
Combine input records=0
Spilled Records=7203900023
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=10688027
CPU time spent (ms)=391899480
Physical memory (bytes) snapshot=15069669965824
Virtual memory (bytes) snapshot=61989010124800
Total committed heap usage (bytes)=17448162033664
File Input Format Counters
Bytes Read=2421643081945
地图过程需要3个多小时,而且很难首次亮相,因为这是我能看到的唯一输出。
我有一个包含10台服务器的集群,每台服务器都有24Gb内存,配置为:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>computer61:8021</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1974</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>7896</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1580m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx6320m</value>
</property>
</configuration>
我添加了
行导出HADOOP_HEAPSIZE = 8192
到hadoop-env.sh文件但没有任何改变。
我知道这是一个老问题,但我在50条帖子中应用了推荐的解决方案而没有任何改进。
当我使用较小的数据集(~1Tb)用于相同的代码时,它可以正常工作。
至少你知道如何让日志知道我在哪里得到了具体的错误吗?
由于
更新
我已经设法在删除之前查看日志。基本上错误是:
2015-04-29 18:23:45,719 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396(104857584); kvend = 25874428(103497712); length = 339969/6553600
2015-04-29 18:23:47,110 INFO [main] org.apache.hadoop.mapred.MapTask: Finished spill 0
2015-04-29 18:23:47,676 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:201)
at java.lang.String.substring(String.java:1956)
at java.lang.String.trim(String.java:2865)
at analysis.MetaDataMapper.map(MetaDataMapper.java:109)
at analysis.MetaDataMapper.map(MetaDataMapper.java:21)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
答案 0 :(得分:2)
减小缓冲区大小可能会有所帮助。默认情况下,Hadoop在开始排序之前缓冲来自映射器的70%的数据,但对于大型数据集,这可能太大。您可以通过将以下属性添加到mapred-site.xml
来减少此输入缓冲区百分比。
<property>
<name>mapred.job.shuffle.input.buffer.percent</name>
<value>0.20</value>
</property>
我已将该值设置为20%,但您可能希望根据数据集和可用内存量进一步降低此值。