Question

我试图弄清楚这篇文章中的两件事：

为什么“建立”不被阻止为“构建”，即使是字段类型定义定义了一个词干分析器。然而，'建筑'是被限制为'建立'
如何使用Luke检查索引以查看哪些词被阻止什么我无法看到'建筑'被阻止'建造' 在卢克。我知道Lucene正在阻止它，因为我能够通过搜索成功检索带有“building”的行 '构建'。

这link非常有用，但没有回答我的问题。

供参考，这里是schema.xml部分。

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <!-- Case insensitive stop word removal.
      add enablePositionIncrements=true in both the index and query
      analyzers to leave a 'gap' for more accurate phrase queries.
    -->
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords_en.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
    <filter class="solr.EnglishMinimalStemFilterFactory"/>
    -->
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords_en.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
    <filter class="solr.EnglishMinimalStemFilterFactory"/>
    -->
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

，字段定义是

<field name="features" type="text_en" indexed="true" stored="true" multiValued="true"/>

数据集由多个文档组成，1个文档在features字段中有“building”，1个文档在同一个字段中“构建”，1个文档在features字段中有“Built-in”：

file：hd.xml：

<field name="features">building NoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor</field>

file ipod_video.xml：

<field name="features">Notes, Calendar, Phone book, Hold button, Date display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware, USB 2.0 compatibility, Playback speed control, Rechargeable capability, Battery level indication</field>

文件sd500.xml：

 <field name="features">built in flash, red-eye reduction</field>

使用Lukeall-3.3.0，这是我搜索'features：build'时得到的结果。请注意，我回来1（而不是预期的3个文件） enter image description here 即使在那一个文档中，我也看不到词干，即我只看到原始单词“building”，如图所示：

再次在Luke中搜索'features：built'，返回两个文档： enter image description here

选择其中一个，显示原始的“内置”但不显示“构建”。 enter image description here

Answer 1

对于像这样的特殊情况，您可以使用StemmerOverrideFilter

调整词干分析算法

在Solr，为什么“建造”不是为了“建造”，而是“建筑”？

1 个答案: