我是Solr的新手,并尝试使用Solr的DIH索引文件系统。有趣的是,它工作得很好 - 一段时间。现在DIH不会初始化并且我不断收到SAXParseException:prolog中不允许使用内容。
有什么想法吗? 我在Debian上使用Solr 3.6.0。我用十六进制编辑器检查了配置文件但没有找到任何内容。
这是data-config.xml:
<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
<dataSource type="BinFileDataSource" name="bin"/>
<document>
<entity name="files"
processor="FileListEntityProcessor"
fileName=".*.(pdf)|(doc)|(docx)|(ppt)|(pptx)"
baseDir="/mnt/C"
rootEntity="false"
dataSource="null"
recursive="true"
onError="skip">
<field name="id" column="fileAbsolutePath"/>
<field name="lastModified" column="fileLastModified"/>
<entity name="f"
processor="TikaEntityProcessor"
url="${files.fileAbsolutePath}"
dataSource="bin"
format="text"
onError="skip">
<field name="fileName" column="file"/>
<field name="author" column="Author" meta="true"/>
<field name="title" column="title" meta="true"/>
<field name="text" column="text"/>
</entity>
</entity>
</document>
</dataConfig>
错误:
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurred while initializing context
at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:231)
at org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.java:119)
at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:168)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:679) Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:391)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1404)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1014)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:625)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:488)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:819)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:748)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:239)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:288)
at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:216)
... 18 more
答案 0 :(得分:0)
XML验证了我。并且错误指向第1行,即字符1.有时它可能是Byte-Order-mark问题,它在文件开头就是一个不可见的字符。
也许您在某个编辑器中编辑了该文件,该编辑器过于兴奋并明确添加了BOM。在十六进制编辑器中重新检查前1-2个字符。或者尝试将此内容复制到一个非常纯文本编辑器中,看看是否可行。