如何防止Solr内存提取Word文档?

时间:2012-12-06 09:53:55

标签: java solr out-of-memory solrj

我正在使用更新/提取请求处理程序上传一个9MB字的doc文件。我正在使用solrj来做到这一点。但是我在同一个文件上不断出现内存不足错误。我正在使用自动提交,但我不认为这是一个问题,因为即使我将其关闭也会失败。

由于gc,cpu使用率达到95%,而且过程大约需要4.4Gb

是否有人知道如何防止此错误?

这是错误:

  <date>2012-12-06Txx:xx:xx</date>
  <millis>1354786975705</millis>
  <sequence>157</sequence>
  <logger>org.apache.solr.servlet.SolrDispatchFilter</logger>
  <level>SEVERE</level>
  <class>org.apache.solr.common.SolrException</class>
  <method>log</method>
  <thread>16</thread>
  <message>null:java.lang.RuntimeException: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:469)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
    at org.eclipse.jetty.server.Server.handle(Server.java:351)
    at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
    at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
    at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
    at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
    at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:952)
    at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
    at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
    at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOfRange(Unknown Source)
    at java.lang.String.&lt;init&gt;(Unknown Source)
    at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseCdataLiteral(PiccoloLexer.java:3027)
    at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseQuotedTagValue(PiccoloLexer.java:2936)
    at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(PiccoloLexer.java:1754)
    at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseOpenTagNS(PiccoloLexer.java:1521)
    at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseTagNS(PiccoloLexer.java:1362)
    at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseXMLNS(PiccoloLexer.java:1293)
    at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseXML(PiccoloLexer.java:1261)
    at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:4808)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)
    at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3439)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1270)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1257)
    at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)
    at org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Unknown Source)
    at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:134)
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
    at org.apache.poi.xwpf.usermodel.XWPFDocument.&lt;init&gt;(XWPFDocument.java:116)
    at org.apache.poi.xwpf.extractor.XWPFWordExtractor.&lt;init&gt;(XWPFWordExtractor.java:53)
    at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:180)
    at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:87)
    at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
    at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
    at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)

2 个答案:

答案 0 :(得分:1)

如果您有64位操作系统,请尝试使用此参数运行您的应用程序

java -Xmx1024m -jar yourfile.jar

将-Xmx1024的值更改为RAMsize的最大值,例如-Xmx16384m。确保将shell导航到jar文件所在的文件夹

答案 1 :(得分:0)

在完成大量应用程序后运行System.gc()