SOLR 8.1.1 EdgeNGramFilterFactory解析查询

时间:2019-07-30 20:28:19

标签: apache solr tokenize

我有一个SOLR 4.10.2内核,并且正在升级到8.1.1。

我使用default_config设置手动创建了8.1.1内核,然后将设置带入8.1.1模式。

我已经调整了schema.xml和solrconfig.xml,并且在8.1.1中具有可查询的核心。

我有一个名为Company的字段

<field name="Company" type="string" indexed="true" stored="true"/>
<field name="IDX_Company" type="text_general" indexed="true" stored="false" multiValued="true" />
<copyField source="Company" dest="IDX_Company"/>

在4.10.2中,当我运行查询时:

IDX_Company:蓝色

启用debugQuery时,我看到查询已正确解析为多个部分

"debug": {
    "rawquerystring": "IDX_Company:blue",
    "querystring": "IDX_Company:blue",
    "parsedquery": "(IDX_Company:b IDX_Company:bl IDX_Company:blu IDX_Company:blue)/no_coord",

...

当我在8.1.1上运行,并启用debugQuery时,得到以下信息:

"debug":{
    "rawquerystring":"IDX_Company:blue",
    "querystring":"IDX_Company:blue",
    "parsedquery":"IDX_Company:blue",

...

似乎没有应用EdgeNGramFilterFactory-根据文档,我对EdgeNGramFilterFactory配置所做的唯一更改是删除了“ side”属性。 另外,根据文档,我将SynonymFilterFactory替换为SynonmGraphFilterFactory,并添加了FlattenGraphFilterFactory。

我尝试删除FlattenGraphFilterFactory,清除并重新填充了核心(已重新编制索引),已经停止并启动了SOLR 8.1.1,没什么区别。

这是我在schema.xml中使用的text_general的定义

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15"/> <!-- RDH - removed side="front"-->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <!-- RDH SynonymFilterFactory has been deprecated, replace with SynonymGraphFilterFactory -->
        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> 
        <!-- RDH https://lucene.apache.org/solr/guide/8_1/filter-descriptions.html
            Flatten Graph Filter
            This filter must be included on INDEX-time analyzer specifications that include at least one graph-aware filter, including Synonym Graph Filter and Word Delimiter Graph Filter.
        -->
        <filter class="solr.FlattenGraphFilterFactory"/>  
        <filter class="solr.LowerCaseFilterFactory"/>
        <!-- strip all punctuation -->
        <filter class="solr.PatternReplaceFilterFactory" pattern="[^\p{L}\p{N} ]" replacement=" " replace="all" /> <!-- RDH -->
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>       
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15"/> <!-- RDH - removed side="front"-->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <!-- RDH SynonymFilterFactory is deprecated, replace with SynonymGraphFilterFactory -->
        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <!-- RDH https://lucene.apache.org/solr/guide/8_1/filter-descriptions.html
            Flatten Graph Filter
            This filter must be included on INDEX-time analyzer specifications that include at least one graph-aware filter, including Synonym Graph Filter and Word Delimiter Graph Filter.
        -->
        <filter class="solr.FlattenGraphFilterFactory"/>  
        <filter class="solr.LowerCaseFilterFactory"/>
        <!-- strip all punctuation -->
        <filter class="solr.PatternReplaceFilterFactory" pattern="[^\p{L}\p{N} ]" replacement=" " replace="all" /> <!-- RDH -->
      </analyzer>
    </fieldType>

1 个答案:

答案 0 :(得分:0)

尽管我通过清除数据并将其发布到核心中来重新加载信息,但是我却忽略了转到“核心管理”页面,选择了核心,然后单击“重新加载”按钮。

现在查询已按预期进行解析。