我遇到麻烦索引文件夹
的问题示例数据-config.xml中:
<dataConfig>
<dataSource type="BinFileDataSource" />
<document>
<entity name="files"
dataSource="null"
rootEntity="false"
processor="FileListEntityProcessor"
baseDir="C:\Temp\" fileName=".*"
recursive="true"
onError="skip">
<field column="fileAbsolutePath" name="id" />
<field column="fileSize" name="size" />
<field column="fileLastModified" name="lastModified" />
<entity
name="documentImport"
processor="TikaEntityProcessor"
url="${files.fileAbsolutePath}"
format="text">
<field column="file" name="fileName"/>
<field column="Author" name="author" meta="true"/>
<field column="text" name="text"/>
</entity>
</entity>
</document>
然后我创建了schema.xml:
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="fileName" type="string" indexed="true" stored="true" />
<field name="author" type="string" indexed="true" stored="true" />
<field name="title" type="string" indexed="true" stored="true" />
<field name="size" type="plong" indexed="true" stored="true" />
<field name="lastModified" type="pdate" indexed="true" stored="true" />
<field name="text" type="text_general" indexed="true" stored="true" multiValued="true"/>
最后我修改了solrConfig.xml文件,添加了requesthandler和dataImportHandler以及dataImportHandler-extra jars:
<requestHandler name="/dataimport" class="solr.DataImportHandler">
<lst name="defaults">
<str name="config">example-data-config.xml</str>
</lst>
</requestHandler>
我运行它,结果是:
在该文件夹中,有不同格式的20.000个文件(.py,.java,.wsdl等)
任何建议将不胜感激。谢谢:))
答案 0 :(得分:0)
检查您的Solr日志。答案是什么DataImportHandler
肯定会在那里。我也遇到过同样的情况,并通过solr日志发现我的encrypted documents
因为文件夹中存在entity
而抛出异常。您的原因可能有所不同,但首先要分析您的solr日志,再次在DataImport
部分执行logging
,然后通过管理页面上的Private Sub LoadActiveCB()
Dim _Active As New List(Of ActiveCB)
_Active.Add(New ActiveCB With {.Name = "Fixed", .ID = 1})
_Active.Add(New ActiveCB With {.Name = "Multiple", .ID = 2})
_Active.Add(New ActiveCB With {.Name = "Repeated", .ID = 3})
cbActive.DataSource = _Active
cbActive.DisplayMember = "Name"
cbActive.ValueMember = "ID"
End Sub
Class ActiveCB
Property Name As String
Property ID As Byte
End Class
部分检查即时日志中的错误。如果你得到的不是我提到的错误,请在这里发布,这样就可以理解和破译它们。
答案 1 :(得分:0)
ERROR (Thread-17) [ x:example] o.a.s.h.d.DocBuilder Exception while processing: files document : null:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 157
at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:244)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: org.apache.tika.exception.TikaException: image/png parse error
at org.apache.tika.parser.image.ImageParser.parse(ImageParser.java:115)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:159)
... 9 more
Caused by: javax.imageio.IIOException: I/O error reading PNG header!
at com.sun.imageio.plugins.png.PNGImageReader.readHeader(PNGImageReader.java:315)
at com.sun.imageio.plugins.png.PNGImageReader.getWidth(PNGImageReader.java:1361)
at org.apache.tika.parser.image.ImageParser.parse(ImageParser.java:92)
... 13 more
Caused by: javax.imageio.IIOException: Image width == 0!
at com.sun.imageio.plugins.png.PNGImageReader.readHeader(PNGImageReader.java:273)
... 15 more