SOLR 8.1.1 SynonymQuery(parsedQuery中的同义词

时间:2019-07-31 20:51:03

标签: apache solr

我有一个SOLR 4.10.2内核,我正在升级到8.1.1。

当我比较parsedQuery搜索4.10和8.1.1时,我看到查询被SynonymQuery(8.1.1中的Synonym()包裹了。

我认为可能是因为我用SynonymGraphFilterFactory代替了不推荐使用的SynonymFilterFactory。

这是SynonymGraphFilterFactory的新功能吗?我该如何删除(或者应该删除?)?

4.10.2

"querystring":"IDX_Company:blue",
    "parsedquery":"(IDX_Company:b IDX_Company:bl IDX_Company:blu IDX_Company:blue)",
...

8.1.1

"querystring":"IDX_Company:blue",
    "parsedquery":"SynonymQuery(Synonym(IDX_Company:b IDX_Company:bl IDX_Company:blu IDX_Company:blue))",
...

这是字段def:

<field name="Company" type="string" indexed="true" stored="true"/>
<field name="IDX_Company" type="text_general" indexed="true" stored="false" multiValued="true" />
<copyField source="Company" dest="IDX_Company"/>

这是我在schema.xml中使用的text_general的定义

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15"/> <!-- RDH - removed side="front"-->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <!-- RDH SynonymFilterFactory has been deprecated, replace with SynonymGraphFilterFactory -->
        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> 
        <!-- RDH https://lucene.apache.org/solr/guide/8_1/filter-descriptions.html
            Flatten Graph Filter
            This filter must be included on INDEX-time analyzer specifications that include at least one graph-aware filter, including Synonym Graph Filter and Word Delimiter Graph Filter.
        -->
        <filter class="solr.FlattenGraphFilterFactory"/>  
        <filter class="solr.LowerCaseFilterFactory"/>
        <!-- strip all punctuation -->
        <filter class="solr.PatternReplaceFilterFactory" pattern="[^\p{L}\p{N} ]" replacement=" " replace="all" /> <!-- RDH -->
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>       
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15"/> <!-- RDH - removed side="front"-->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <!-- RDH SynonymFilterFactory is deprecated, replace with SynonymGraphFilterFactory -->
        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <!-- RDH https://lucene.apache.org/solr/guide/8_1/filter-descriptions.html
            Flatten Graph Filter
            This filter must be included on INDEX-time analyzer specifications that include at least one graph-aware filter, including Synonym Graph Filter and Word Delimiter Graph Filter.
        -->
        <filter class="solr.FlattenGraphFilterFactory"/>  
        <filter class="solr.LowerCaseFilterFactory"/>
        <!-- strip all punctuation -->
        <filter class="solr.PatternReplaceFilterFactory" pattern="[^\p{L}\p{N} ]" replacement=" " replace="all" /> <!-- RDH -->
      </analyzer>
    </fieldType>

1 个答案:

答案 0 :(得分:0)

如果您要扩展blue => b, bl, blu, blue,那是由于您在text_general模式中拥有EdgeNGramFilterFactory

但是您在解析查询中看到的SynonymQuery是由于SynonymGraphFilterFactory引起的,尽管实际上并没有发生同义词扩展,因为您可能没有 blue 的同义词出现在您的alias.txt文件中。

如果根本不需要同义词匹配,则只需从架构中删除与SynonymGraphFilterFactory关联的filter即可删除它。