我的表有一个参考URL作为文件的列,以及其他列。示例表如下,我试图将表与SOLR中的文件内容一起索引。这些文件可通过网址访问,其中包含' http://domain.com/'前缀例如,' http://domain.com/file/sample1.pdf'。我将无法以文件共享的形式访问这些文件。
Filepath author Title
file/sample1.pdf Jack title 1
file/sample2.pdf Bob title 2
file/sample3.docx Tim title 2
我的db-data-import xml是这样的,
<dataConfig>
<dataSource name="dbrows" driver="oracle.jdbc.OracleDriver"
url="jdbc:oracle:thin:@.....
user="***"
password="***"/>
<dataSource type="BinFileDataSource" name="attachments" />
<document>
<entity name="docs" dataSource="dbrows" query="select 'http://domain.com/'||filepath as PATH,author,title from dummytable" >
<entity name="file"
processor="TikaEntityProcessor"
url="${docs.PATH}"
dataSource="attachments"
format="text"
onError="continue"
transformer="script:processFile">
<field column="text" name="text" />
</entity>
</entity>
</document>
</dataConfig>
我得到的错误是
2015-10-13 23:15:43.859 WARN (Thread-25) [ x:db] o.a.s.h.d.FileDataSource FileDataSource.basePath is empty. Resolving to: C:\Users\asdf\Downloads\Solr\solr-5.3.1\server\.
2015-10-13 23:15:43.860 ERROR (Thread-25) [ x:db] o.a.s.h.d.EntityProcessorWrapper Exception in entity : file:java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: http://domain.com/file/sample1.pdf (resolved to: C:\Users\asdf\Downloads\Solr\solr-5.3.1\server\.\http://domain.com/file/sample1.pdf
at org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.java:126)
at org.apache.solr.handler.dataimport.BinFileDataSource.getData(BinFileDataSource.java:51)
at org.apache.solr.handler.dataimport.BinFileDataSource.getData(BinFileDataSource.java:42)
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:131)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.io.FileNotFoundException: Could not find file: http://domain.com/file/sample1.pdf (resolved to: C:\Users\asdf\Downloads\Solr\solr-5.3.1\server\.\http://domain.com/file/sample1.pdf
at org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.java:122)
... 12 more
2015-10-13 23:15:43.890 WARN (Thread-25) [ x:db] o.a.s.h.d.FileDataSource FileDataSource.basePath is empty. Resolving to: C:\Users\asdf\Downloads\Solr\solr-5.3.1\server\.
这甚至可能吗?任何帮助都非常感谢。
答案 0 :(得分:2)
固定。使用BinURLDataSource而不是BinFileDataSource
<dataSource type="BinFileDataSource" name="attachments" />
将此更改为
<dataSource type="BinURLDataSource" name="attachments" />