SOLR停用词:带'of'的单词没有结果,但是当排除时我们得到正确的结果

时间:2015-10-16 16:11:15

标签: solr stop-words

任何人都可以解释SOLR中的停止词是如何工作的。 在我的stopword.txt中,我定义了of。在schema.xml我有

<filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"enablePositionIncrements="true"/>

现在,当我搜索包含单词of的任何内容时,结果中都没有显示。

示例: oil of olay显示没有结果,oil olay显示正确的结果。

更多文件定义:

        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
            <filter class="solr.StopFilterFactory"
                    ignoreCase="true"
                    words="stopwords.txt"
                    enablePositionIncrements="true"
                    />
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.WordDelimiterFilterFactory"
                    generateWordParts="1"
                    generateNumberParts="1"
                    catenateWords="1"
                    catenateNumbers="1"
                    catenateAll="1"
                    preserveOriginal="1"
                    splitOnCaseChange="0"
                    splitOnNumerics="0"
                    types="wdtypes.txt"
                    />
            <filter class="solr.KeywordRepeatFilterFactory"/>
            <filter class="solr.EnglishMinimalStemFilterFactory"/>
            <filter class="solr.TrimFilterFactory" updateOffsets="false"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>

        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.StopFilterFactory"
                    ignoreCase="true"
                    words="stopwords.txt"
                    enablePositionIncrements="true"
                    />
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.WordDelimiterFilterFactory"
                    generateWordParts="1"
                    generateNumberParts="1"
                    catenateWords="1"
                    catenateNumbers="1"
                    catenateAll="1"
                    preserveOriginal="1"
                    splitOnCaseChange="0"
                    splitOnNumerics="0"
                    types="wdtypes.txt"
                    />
            <filter class="solr.EnglishMinimalStemFilterFactory"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>

调试时: +(upclist:+小麦&安培的奶油+; QT = productresults&安培;行数= 10安培; FQ =状态%3AActive&安培; FQ = facilitystatus%3AActive&安培; FQ = facilityid%3A100&安培; FQ = inventoryctrlcode%3A%5B0 + TO + 100%5D&安培; fq = weblifecycle%3A%283 + OR + 4%29&amp; fq = groupnumber%3A2 ^ 1.2 |关键词:cream + of + wheat&amp; qt = productresults&amp; rows = 10&amp; fq = status%3aactive&amp; fq = facilitystatus%3aactive&amp; fq = facilityid%3a100&amp; fq = inventoryctrlcode%3a%5b0 +至+ 100%5d&amp; fq = weblifecycle%3a%283 +或+ 4%29&amp; fq = groupnumber%3a2 ^ 20.0 | product_elevate:cream + of + wheat&amp; QT = productresults&安培;行数= 10安培; FQ =状态%3aactive&安培; FQ = facilitystatus%3aactive&安培; FQ = facilityid%3a100&安培; FQ = inventoryctrlcode%3A%5b0 +到+ 100%5D&安培; FQ = weblifecycle%3A%283 +或+ 4%29&amp; fq = groupnumber%3a2 ^ 5.0 | area:“(cream + of + wheat&amp; qt = productresults&amp; rows = 10&amp; fq = status%3aactive&amp; fq = facilitystatus%3aactive&amp; fq = facilityid%3a100&amp; fq =小麦qt productresul的inventoryctrlcode%3a%5b0 +至+ 100%5d&amp; fq = weblifecycle%3a%283 +或+ 4%29&amp; fq = groupnumber%3a2 cream) t(row creamofwheatqtproductresultsrow)10 fq status%3aactive fq facilitystatus%3aactive fq facilityid%3a100 fq inventoryctrlcode%3a%5b0(to fqstatus%3aactivefqfacilitystatus%3aactivefqfacilityid%3a100fqinventoryctrlcode%3a%5b0to)100%5d fq weblifecycle%3a%283(或fqweblifecycle %3a%283或者)4%29 fq(groupnumber%3a2 fqgroupnumber%3a2 creamofwheatqtproductresultsrows10fqstatus%3aactivefqfacilitystatus%3aactivefqfacilityid%3a100fqinventoryctrlcode%3a%5b0to100%5dfqweblifecycle%3a%283or4%29fqgroupnumber%3a2)“~3 ^ 2.5 |产品编号:+小麦&安培的奶油+; QT = productresults&安培;行数= 10安培; FQ =状态%3AActive&安培; FQ = facilitystatus%3AActive&安培; FQ = facilityid%3A100&安培; FQ = inventoryctrlcode%3A%5B0 + TO + 100%5D&安培; FQ = weblifecycle%3A%283 + OR + 4%29&amp; fq = groupnumber%3A2 ^ 1.7 |产品名称:+小麦&安培的奶油+; QT = productresults&安培;行数= 10安培; FQ =状态%3aactive&安培; FQ = facilitystatus%3aactive&安培; FQ = facilityid%3a100&安培; FQ = inventoryctrlcode%3A%5b0 +到+ 100%5D&安培; FQ = weblifecycle%3a%283 +或+ 4%29&amp; fq = groupnumber%3a2 ^ 10.0)~0.01()

2 个答案:

答案 0 :(得分:0)

这可能不相关,因为你说你只搜索一个字段(无论如何我都在发帖,因为你说你使用的是edismax和qf)。当我想提高精确搜索时,我遇到了类似的问题,因此我将qf设为这样:<str name="qf">title^45 title_str^55。标题字段使用了停用词,而title_str显然不是。 here描述了经常无法使用停用词找到搜索的原因。他们的解决方案是使用mm值。在我的情况下工作的解决方案是将title_str放在pf标签中(并从qf标签中删除),因此确切的查找将出现在顶部。

答案 1 :(得分:0)

最后通过更改此问题解决了这个问题:

“mm”从2 <-25%至2 <-36%