我已经将我的2台服务器配置为以分布式模式运行(使用Hadoop),我的爬网过程配置是Nutch 2.2.1 - HBase(作为存储)和Solr。 Solr由Tomcat运行。问题是我每次尝试做最后一步 - 我的意思是当我想将HBase中的数据索引到Solr时。之后发生 [1] 错误。我试着像这样添加CATALINA_OPTS(或JAVA_OPTS):
CATALINA_OPTS =" $ JAVA_OPTS -XX:+ UseConcMarkSweepGC -Xms1g -Xmx6000m -XX:MinHeapFreeRatio = 10 -XX:MaxHeapFreeRatio = 30 -XX:MaxPermSize = 512m -XX:+ CMSClassUnloadingEnabled"
到Tomcat的catalina.sh脚本并使用此脚本运行服务器,但它没有帮助。我还将这些 [2] 属性添加到nutch-site.xml文件中,但最终又以OutOfMemory
结尾。你能帮我吗?
[1]
2014-09-06 22:52:50,683 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:587)
at java.lang.StringBuffer.append(StringBuffer.java:332)
at java.io.StringWriter.write(StringWriter.java:77)
at org.apache.solr.common.util.XML.escape(XML.java:204)
at org.apache.solr.common.util.XML.escapeCharData(XML.java:77)
at org.apache.solr.common.util.XML.writeXML(XML.java:147)
at org.apache.solr.client.solrj.util.ClientUtils.writeVal(ClientUtils.java:161)
at org.apache.solr.client.solrj.util.ClientUtils.writeXML(ClientUtils.java:129)
at org.apache.solr.client.solrj.request.UpdateRequest.writeXML(UpdateRequest.java:355)
at org.apache.solr.client.solrj.request.UpdateRequest.getXML(UpdateRequest.java:271)
at org.apache.solr.client.solrj.request.RequestWriter.getContentStream(RequestWriter.java:66)
at org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream.getDelegate(RequestWriter.java:94)
at org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream.getName(RequestWriter.java:104)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:247)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:96)
at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:117)
at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:54)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:650)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1793)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
[2]
<property>
<name>http.content.limit</name>
<value>150000000</value>
</property>
<property>
<name>indexer.max.tokens</name>
<value>100000</value>
</property>
<property>
<name>http.timeout</name>
<value>50000</value>
</property>
<property>
<name>solr.commit.size</name>
<value>100</value>
</property>
答案 0 :(得分:0)
我已使用下面的配置解决了它(mapred-site.xml文件):
<property>
<name>mapred.jobtracker.retirejob.interval</name>
<value>3600000</value>
</property>
<property>
<name>mapred.job.tracker.retiredjobs.cache.size</name>
<value>100</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx4000m -XX:+UseConcMarkSweepGC</value>
</property>
<property>
<name>mapred.child.ulimit</name>
<value>6000000</value>
</property>
<property>
<name>mapred.jobtracker.completeuserjobs.maximum</name>
<value>5</value>
</property>
<property>
<name>mapred.job.tracker.handler.count</name>
<value>5</value>
</property>