Question

在synonyms.txt中，我有：

you're => you are

在查看提供分析工具的内容时，＆＃34;因为您已经开始使用＆＃34;，它会在＆＃34;中展开;因为您的是＆＃34;，这对于全文搜索来说很好，但是带状疱疹是个大问题。我想知道扩展是不是放在最后，但是＆＃34;你是因为我的＆＃34;被扩展为＆＃34;你因为是我的＆＃34;，在它们之间插入以下单词。我也测试了＃34;因为我的你已经＆＃34;它被扩展为＆＃34;因为我是你的＃34;。

有关为何会出现这种情况的任何想法？

以下分析工具的屏幕截图，使其100％清晰： screencap

Answer 1

模式中的

查询部分：

  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="wordlists/english-common-nouns.txt" minWordSize="5" minSubwordSize="4" maxSubwordSize="15" onlyLongestMatch="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <!-- this filter can remove any duplicate tokens that appear at the same position - sometimes
         possible with WordDelimiterFilter in conjuncton with stemming. -->
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>        
  </analyzer>

我只是让WDF进行标记化，你是=＆gt;你是。在我定义的synonyms.txt中：

you re => you are

这不是最优雅的方式，但它有效，即按您需要的顺序存储令牌。

screenshot to prove

Answer 2

您可以使用Synonym-Expanding EDisMax Parser，它会在进行文本分析之前添加同义词：https://github.com/healthonnet/hon-lucene-synonyms

solr：当一个单词被多字同义词替换时，单词的顺序就会丢失

2 个答案: