我正在使用hadoop 2.7.2和nutch 1.12。当我在hadoop中运行nutch工作时,我在nutch解析阶段遇到以下错误。
17/10/03 14:01:52 INFO mapreduce.Job: Running job: job_1506573729189_0223
17/10/03 14:02:05 INFO mapreduce.Job: Job job_1506573729189_0223 running in uber mode : false
17/10/03 14:02:05 INFO mapreduce.Job: map 0% reduce 0%
17/10/03 14:02:15 INFO mapreduce.Job: map 1% reduce 0%
17/10/03 14:02:18 INFO mapreduce.Job: map 2% reduce 0%
17/10/03 14:02:21 INFO mapreduce.Job: map 3% reduce 0%
17/10/03 14:02:24 INFO mapreduce.Job: map 4% reduce 0%
17/10/03 14:02:27 INFO mapreduce.Job: map 8% reduce 0%
17/10/03 14:02:30 INFO mapreduce.Job: map 12% reduce 0%
17/10/03 14:03:35 INFO mapreduce.Job: Task Id : attempt_1506573729189_0223_m_000000_0, Status : FAILED
Error: Java heap space
17/10/03 14:03:36 INFO mapreduce.Job: map 11% reduce 0%
17/10/03 14:03:46 INFO mapreduce.Job: map 14% reduce 0%
17/10/03 14:04:48 INFO mapreduce.Job: Task Id : attempt_1506573729189_0223_m_000000_1, Status : FAILED
Error: Java heap space
要删除上述错误,我将以下更改添加到hadoops mapred-site.xml。
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx4096m</value>
</property>
当我添加此属性时,我收到如下新错误。
17/10/04 11:06:15 INFO mapreduce.Job: Running job: job_1507094901386_0004
17/10/04 11:06:26 INFO mapreduce.Job: Job job_1507094901386_0004 running in uber mode : false
17/10/04 11:06:26 INFO mapreduce.Job: map 0% reduce 0%
17/10/04 11:06:28 INFO mapreduce.Job: Task Id : attempt_1507094901386_0004_m_000000_0, Status : FAILED
Container [pid=8299,containerID=container_1507094901386_0004_01_000002] is running beyond virtual memory limits. Current usage: 97.6 MB of 1 GB physical memory used; 5.7 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1507094901386_0004_01_000002 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 8299 8297 8299 8299 (bash) 0 0 17092608 707 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx4096m -Djava.io.tmpdir=/tmp/hadoop-hduser/nm-local-dir/usercache/hduser/appcache/application_1507094901386_0004/container_1507094901386_0004_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/usr/local/hadoop/logs/userlogs/application_1507094901386_0004/container_1507094901386_0004_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 192.168.1.63 33402 attempt_1507094901386_0004_m_000000_0 2 1>/usr/local/hadoop/logs/userlogs/application_1507094901386_0004/container_1507094901386_0004_01_000002/stdout 2>/usr/local/hadoop/logs/userlogs/application_1507094901386_0004/container_1507094901386_0004_01_000002/stderr
|- 8303 8299 8299 8299 (java) 73 4 6100692992 24291 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx4096m -Djava.io.tmpdir=/tmp/hadoop-hduser/nm-local-dir/usercache/hduser/appcache/application_1507094901386_0004/container_1507094901386_0004_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/usr/local/hadoop/logs/userlogs/application_1507094901386_0004/container_1507094901386_0004_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 192.168.1.63 33402 attempt_1507094901386_0004_m_000000_0 2
通过在以下位置设置属性来删除这些错误 mapred-site.xml。我也删除了上面的属性&map; .redred.child.java.opts&#39;
<property>
<name>mapreduce.map.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>8192</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx3072m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx6144m</value>
</property>
但即使我坚持下面这一行。
17/10/11 17:00:32 INFO mapreduce.Job: Running job: job_1507721357521_0001
17/10/11 17:00:56 INFO mapreduce.Job: Job job_1507721357521_0001 running in uber mode : false
17/10/11 17:00:56 INFO mapreduce.Job: map 0% reduce 0%
17/10/11 17:01:08 INFO mapreduce.Job: map 100% reduce 0%