我正在使用 Solr 6.4.1 版本,我最近向solr发布了大约1000个文件以进行索引。我在Windows 10中使用 Windows Powershell 来使用命令发布文件。
PS C:\ solr-6.4.1> java -Dc = Solr_sample -Dauto = yes -Ddata = files -Drecursive = yes -jar example / exampledocs / post.jar E:\ Test \
但是在其中我发现一个文件没有编入索引,我试图再次使用以下命令索引该特定文件,但没有运气。该文件大小为212MB。我已经附上了错误以及所有内容。你能帮我把这个文件发给Solr索引吗。
PS C:\solr-6.4.1> java -Dc=Solr_sample -Dauto=yes -Ddata=files -Drecursive=yes -jar example/exampledocs/post.jar E:\Test\C0000000045\
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/Solr_sample/update...
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory E:\Test\C0000000045 (1 files, depth=0)
POSTing file 20162436739-Spheres Volume 3 Foams Plural Spherology. Peter Sloterdijk. MIT.pdf (application/pdf) to [base]/extract
SimplePostTool: WARNING: Solr returned an error #500 (Server Error) for url: http://localhost:8983/solr/Solr_sample/update/extract?resource.name=E%3A%5CTest%5CC0000000045%5C20162436739-Spheres+Volume+3+Foams+Plural+Spherology.+Peter+Sloterdijk.+MIT.pdf&literal.id=E%3A%5C
Test%5CC0000000045%5C20162436739-Spheres+Volume+3+Foams+Plural+Spherology.+Peter+Sloterdijk.+MIT.pdf
SimplePostTool: WARNING: Response: <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 500 Server Error</title>
</head>
<body><h2>HTTP ERROR 500</h2>
<p>Problem accessing /solr/Solr_sample/update/extract. Reason:
<pre> Server Error</pre></p><h3>Caused by:</h3><pre>java.lang.OutOfMemoryError: Java heap space
at java.io.PushbackInputStream.<init>(Unknown Source)
at org.apache.pdfbox.pdfparser.InputStreamSource.<init>(InputStreamSource.java:39)
at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:55)
at org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:821)
at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:727)
at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:652)
at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:612)
at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:215)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:249)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:972)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:908)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:131)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
</pre>
</body>
</html>
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 500 for URL: http://localhost:8983/solr/Solr_sample/update/extract?resource.name=E%3A%5CTest%5CC0000000045%5C20162436739-Spheres+Volume+3+Foams+Plural+Sp
herology.+Peter+Sloterdijk.+MIT.pdf&literal.id=E%3A%5CTest%5CC0000000045%5C20162436739-Spheres+Volume+3+Foams+Plural+Spherology.+Peter+Sloterdijk.+MIT.pdf
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/Solr_sample/update...
Time spent: 0:00:13.795
答案 0 :(得分:1)
jvm用完了ram,因为你没有明确地设置堆大小,jvm正在使用默认值。从pdf中提取文本可能会占用很多内存,因此您可以尝试尽可能多地提供Solr(请注意,对于这种特殊情况,而不是一般的Solr用法),因此请使用更多ram启动solr。这取决于你现在如何开始,如果你正在使用内置服务,编辑solr.in.sh并取消注释/修改此行
#SOLR_JAVA_MEM="-Xmx8g-Xmx8g"
如果你有8gb免费使用(适应你的情况)
答案 1 :(得分:1)