Nutch + Solr - Indexer导致java.lang.OutOfMemoryError:Java堆空间

时间:2014-09-07 09:35:58

标签: java tomcat hadoop solr nutch

我已经将我的2台服务器配置为以分布式模式运行(使用Hadoop),我的爬网过程配置是Nutch 2.2.1 - HBase(作为存储)和Solr。 Solr由Tomcat运行。问题是我每次尝试做最后一步 - 我的意思是当我想将HBase中的数据索引到Solr时。之后发生 [1] 错误。我试着像这样添加CATALINA_OPTS(或JAVA_OPTS):

  

CATALINA_OPTS =" $ JAVA_OPTS -XX:+ UseConcMarkSweepGC -Xms1g -Xmx6000m   -XX:MinHeapFreeRatio = 10 -XX:MaxHeapFreeRatio = 30 -XX:MaxPermSize = 512m -XX:+ CMSClassUnloadingEnabled"

到Tomcat的catalina.sh脚本并使用此脚本运行服务器,但它没有帮助。我还将这些 [2] 属性添加到nutch-site.xml文件中,但最终又以OutOfMemory结尾。你能帮我吗?

[1]

2014-09-06 22:52:50,683 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space 
    at java.util.Arrays.copyOf(Arrays.java:2367) 
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) 
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) 
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:587) 
    at java.lang.StringBuffer.append(StringBuffer.java:332) 
    at java.io.StringWriter.write(StringWriter.java:77) 
    at org.apache.solr.common.util.XML.escape(XML.java:204) 
    at org.apache.solr.common.util.XML.escapeCharData(XML.java:77) 
    at org.apache.solr.common.util.XML.writeXML(XML.java:147) 
    at org.apache.solr.client.solrj.util.ClientUtils.writeVal(ClientUtils.java:161) 
    at org.apache.solr.client.solrj.util.ClientUtils.writeXML(ClientUtils.java:129) 
    at org.apache.solr.client.solrj.request.UpdateRequest.writeXML(UpdateRequest.java:355) 
    at org.apache.solr.client.solrj.request.UpdateRequest.getXML(UpdateRequest.java:271) 
    at org.apache.solr.client.solrj.request.RequestWriter.getContentStream(RequestWriter.java:66) 
    at org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream.getDelegate(RequestWriter.java:94) 
    at org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream.getName(RequestWriter.java:104) 
    at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:247) 
    at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197) 
    at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) 
    at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) 
    at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54) 
    at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:96) 
    at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:117) 
    at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:54) 
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:650) 
    at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1793) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) 
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) 

[2]

<property>
  <name>http.content.limit</name>
  <value>150000000</value>
</property>

<property>
   <name>indexer.max.tokens</name>
   <value>100000</value>
</property>

<property>
  <name>http.timeout</name>
  <value>50000</value>
</property>

<property>
  <name>solr.commit.size</name>
  <value>100</value>
</property>

1 个答案:

答案 0 :(得分:0)

我已使用下面的配置解决了它(mapred-site.xml文件):

<property>
  <name>mapred.jobtracker.retirejob.interval</name>
  <value>3600000</value>
</property>

<property>
  <name>mapred.job.tracker.retiredjobs.cache.size</name>
  <value>100</value>
</property>

<property>
<name>mapred.child.java.opts</name>
<value>-Xmx4000m -XX:+UseConcMarkSweepGC</value>
</property>

<property>
<name>mapred.child.ulimit</name>
<value>6000000</value>
</property>

<property>
  <name>mapred.jobtracker.completeuserjobs.maximum</name>
  <value>5</value>
</property>

<property>
  <name>mapred.job.tracker.handler.count</name>
  <value>5</value>
</property>