Solr 5.3.1 - Db-import-hander - TikaEntityProcessor无法找到我的http文件

时间:2015-10-13 23:46:01

标签: solr apache-tika

我的表有一个参考URL作为文件的列,以及其他列。示例表如下,我试图将表与SOLR中的文件内容一起索引。这些文件可通过网址访问,其中包含' http://domain.com/'前缀例如,' http://domain.com/file/sample1.pdf'。我将无法以文件共享的形式访问这些文件。

Filepath                author   Title
file/sample1.pdf        Jack     title 1
file/sample2.pdf        Bob      title 2
file/sample3.docx       Tim      title 2

我的db-data-import xml是这样的,

<dataConfig>
    <dataSource name="dbrows" driver="oracle.jdbc.OracleDriver" 
                url="jdbc:oracle:thin:@..... 
                user="***"
                password="***"/>    
    <dataSource type="BinFileDataSource" name="attachments" />

    <document>
        <entity name="docs"  dataSource="dbrows" query="select 'http://domain.com/'||filepath as PATH,author,title from dummytable" >           

        <entity name="file"
                processor="TikaEntityProcessor"
                url="${docs.PATH}"
                dataSource="attachments"
                format="text"
                onError="continue"
                transformer="script:processFile">
          <field column="text" name="text" />
          </entity>  
        </entity>
    </document>
</dataConfig>

我得到的错误是

2015-10-13 23:15:43.859 WARN  (Thread-25) [   x:db] o.a.s.h.d.FileDataSource FileDataSource.basePath is empty. Resolving to: C:\Users\asdf\Downloads\Solr\solr-5.3.1\server\.
2015-10-13 23:15:43.860 ERROR (Thread-25) [   x:db] o.a.s.h.d.EntityProcessorWrapper Exception in entity : file:java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: http://domain.com/file/sample1.pdf (resolved to: C:\Users\asdf\Downloads\Solr\solr-5.3.1\server\.\http://domain.com/file/sample1.pdf
    at org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.java:126)
    at org.apache.solr.handler.dataimport.BinFileDataSource.getData(BinFileDataSource.java:51)
    at org.apache.solr.handler.dataimport.BinFileDataSource.getData(BinFileDataSource.java:42)
    at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:131)
    at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
    at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
    at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514)
    at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
    at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
    at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
    at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
    at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
    at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.io.FileNotFoundException: Could not find file: http://domain.com/file/sample1.pdf (resolved to: C:\Users\asdf\Downloads\Solr\solr-5.3.1\server\.\http://domain.com/file/sample1.pdf
    at org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.java:122)
    ... 12 more

2015-10-13 23:15:43.890 WARN  (Thread-25) [   x:db] o.a.s.h.d.FileDataSource FileDataSource.basePath is empty. Resolving to: C:\Users\asdf\Downloads\Solr\solr-5.3.1\server\.

这甚至可能吗?任何帮助都非常感谢。

1 个答案:

答案 0 :(得分:2)

固定。使用BinURLDataSource而不是BinFileDataSource

<dataSource type="BinFileDataSource" name="attachments" />

将此更改为

<dataSource type="BinURLDataSource" name="attachments" />