不支持的类型使用Solr 4.0从数据库导入文档时出现异常

时间:2013-04-26 17:49:52

标签: solr dataimporthandler

查找相关问题上提供的信息,以设置导入存储在mysql数据库中的所有文档。 you can find the original question here

感谢您提供的步骤,我能够使用mysql DB使其工作。我的配置看起来与上面提到的链接相同。

<dataConfig>
  <dataSource name="db"
    jndiName="java:jboss/datasources/somename"
    type="JdbcDataSource" 
    convertType="false" />
  <dataSource name="dastream" type="FieldStreamDataSource" />
  <dataSource name="dareader" type="FieldReaderDataSource" />
  <document name="docs">
    <entity name="doc" query="select * from document" dataSource="db">
      <field name="id" column="id" />
      <field name="name" column="descShort" />
      <entity name="comment" 
        transformer="HTMLStripTransformer" dataSource="db"
        query="select id, body, subject from comment where iddoc='${doc.id}'">
        <field name="idComm" column="id" />
        <field name="detail" column="body" stripHTML="true" />
        <field name="subject" column="subject" />
      </entity>
      <entity name="attachments" 
        query="select id, attName, attContent, attContentType from Attachment where iddoc='${doc.id}'"
        dataSource="db">
        <field name="attachment_name" column="attName" />
        <field name="idAttachment" column="id" />
        <field name="attContentType" column="attContentType" />
        <entity name="attachment" 
          dataSource="dastream"
          processor="TikaEntityProcessor"
          url="attContent"
          dataField="attachments.attContent"
          format="text"
          onError="continue">
          <field column="text" name="attachment_detail" />
        </entity>
      </entity>
    </entity>
  </document>
</dataConfig>

我在数据库中有各种附件,例如jpeg,pdf,excel,doc和纯文本。现在一切都适用于大多数二进制数据(jpeg,pdf doc等)。但某些文件导入失败。似乎数据源设置为在遇到String而不是InputStream时抛出异常。我在实体“attachment”上设置了onError =“continue”标志,以确保DataImport尽管出现此错误仍然通过。注意到许多文件发生了这个问题。例外情况如下。想法??

Exception in entity : attachment:java.lang.RuntimeException: unsupported type : class java.lang.String 
at org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:89) 
at org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:48) 
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:103) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) 
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465) 
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) 
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) 
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404) 
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319) 
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227) 
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422) 
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) 
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)

1 个答案:

答案 0 :(得分:1)

我知道这是一个过时的问题,但是: 在我看来,当BLOB(我使用Oracle)为null时抛出此异常。当我添加一个像“blob_column is not null”这样的where子句时,问题就消失了(Solr 4.10.1)