Question

我想使用数据导入处理程序索引solr 4.3.0中的pdf文件。

我做了以下事情：

我的请求处理程序 -

<requestHandler name="/dataimport" 
class="org.apache.solr.handler.dataimport.DataImportHandler">  
    <lst name="defaults">  
      <str name="config">data-config.xml</str>  
    </lst>  
  </requestHandler>

我的data-config.xml

<dataConfig>  
<dataSource type="BinFileDataSource" />  
<document>  
<entity name="f" dataSource="null" rootEntity="false" 
processor="FileListEntityProcessor" 
baseDir="C:\Users\aroraarc\Desktop\Impdo" fileName=".*pdf" 
recursive="true">  
<entity name="tika-test" processor="TikaEntityProcessor" 
url="${f.fileAbsolutePath}" format="text">  
<field column="Author" name="author" meta="true"/>
<field column="title" name="title" meta="true"/>
<field column="text" name="text"/>
</entity>  
</entity>  
</document>  
</dataConfig>

现在，当我尝试索引文档时，我收到了以下错误

org.apache.solr.common.SolrException：文档缺少必需的uniqueKey字段：id

因为在我的情况下我不想要任何唯一键我按如下方式禁用它：

在solrconfig.xml中我注释掉了 -

<searchComponent name="elevator" class="solr.QueryElevationComponent" >
    pick a fieldType to analyze queries 
    <str name="queryFieldType">string</str>
    <str name="config-file">elevate.xml</str>
  </searchComponent>

在schema.xml中，我注释掉了<uniquekey>id</uniquekey>

并添加了

<fieldType name="uuid" class="solr.UUIDField" indexed="true" /> 
<field name="id" type="uuid" indexed="true" stored="true" default="NEW" />

并在elevate.xml中进行了以下更改

<elevate>
 <query text="foo bar">
  <doc id="4602376f-9741-407b-896e-645ec3ead457" />
 </query>
</elevate>

当我这样做时，索引会发生，但索引文档包含author，s_author和id字段。该文档应包含作者，文本，标题和id字段（在我的data-config.xml中定义）。请帮帮我。我做错了吗？这个s_author字段来自哪里？

<doc>
    <str name="author">arora arc</str>
    <str name="author_s">arora arc</str>
    <str name="id">4f65332d-49d9-497a-b88b-881da618f571</str></doc>

索引中的pdf文件索引

0 个答案: