导入时Solr SynonymGraphFilterFactory错误

时间:2018-03-05 09:51:12

标签: solr

升级到Solr 7.2后,导入开始记录某些文档的错误。

返回错误的字段:

   <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.FlattenGraphFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> 
    <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <filter class="solr.EnglishMinimalStemFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <filter class="solr.EnglishMinimalStemFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

在导入过程中,会返回某些记录的错误:

org.apache.solr.common.SolrException: Exception writing document id XXXXX to the index; possible analysis error: startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=2874,endOffset=2878,lastStartOffset=2879 for field 'XXXXX'
at g.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:226)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:936)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:616)
at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:80)

有关
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.FlattenGraphFilterFactory"/>

如果我删除它,它工作正常,以前我们使用:

<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

它工作正常,但Solr 7.X上不再支持 SynonymFilterFactory ,它已被 SynonymGraphFilterFactory 取代,我添加了建议使用FlattenGraphFilterFactory

如果从以下位置更新了synoyms.txt文件:

word,synonym1,synonym2

word =&gt;同义词1,同义词2

一切正常,但是

word =&gt; word,synonym1,synonym2 - 不起作用。

我不确定为什么Solr会返回这些错误?

提前感谢您提出任何建议。

1 个答案:

答案 0 :(得分:0)

调用后添加solr.FlattenGraphFilterFactory 到WordDelimiterGraphFilterFactory解决了我的问题,  根据WebsterHomer建议(http://lucene.472066.n3.nabble.com/Solr-SynonymGraphFilterFactory-error-on-import-td4378265.html