如果查询字符串以短字

时间:2017-10-12 09:20:25

标签: solr lucene

我正在使用solr 4,我有长度过滤器工厂

<filter class="solr.LengthFilterFactory" min="3" max="99"/>

如果我的查询字符串以少于3个字符的单词开头,则solr不会返回任何结果。我没想到这是问题,因为我使用的是LengthFilterFactory。以下是示例:

标题是:&#34; 在不久的将来...... &#34;

如果我搜索q:In the close future,则solr不返回任何内容 如果我搜索q:the close future,solr会找到记录

标题是:&#34; 我有一些solr问题&#34; 同样的事情发生在上面......

我不能允许搜索短于3个字符的单词,但我没想到如果我使用短于3个字符的单词,则会导致solr失败。也许LengthFilterFactory不是问题吗?

这是我的查询示例:

INFO: [collection1] webapp=/solr-example path=/select params={mm=100%25&json.nl=flat&fl=id&start=0&sort=date_0_i+desc,hour_0_i+desc&fq=type_s:(1+5+6+8+9+10)&fq=site_i:1&fq=terms_txt:I+have+some+solr+problem&fq=date_in_i:[20050101+TO+*]&fq=date_in_i:[*+TO+20171012]&fq=language_is:0&rows=10&bq=&q=I+have+some+solr+problem&tie=0.1&defType=edismax&omitHeader=true&qf=terms_txt&wt=json} hits=0 status=0 QTime=1

这是我的架构。我将向您展示我正在搜索的字段的字段定义。任何人都知道这里有什么问题吗?

<fieldType name="text_general_example" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>          
    <charFilter class="solr.HTMLStripCharFilterFactory"/>
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(^|\s)([^\-\_&amp;\s]+([\-\_&amp;]+[^\-\_&amp;\s]*)+)(?=(\s|$))" replacement="$1MжџљМ$2 $2" />
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\bMжџљМ([^\s]*?)\b[\-_&amp;]+" replacement="MжџљМ$1" />
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\bMжџљМ([^\s]*?)\b[\-_&amp;]+" replacement="MжџљМ$1" />
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\bMжџљМ([^\s]*?)\b[\-_&amp;]+" replacement="MжџљМ$1" />
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="MжџљМ" replacement="" />
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(\w)&amp;(\w)" replacement="$1and$2" />
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LengthFilterFactory" min="3" max="99"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\b[\-_]+\b" replacement="" />
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(\w)&amp;(\w)" replacement="$1and$2" />
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LengthFilterFactory" min="3" max="99"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
  </analyzer>
</fieldType>

1 个答案:

答案 0 :(得分:1)

The problem with your query is, that if you're searching for

field:I have a problem

than, after parsing you will actually get the following query field:I defaultField:have defaultField:a ..., where default field is usually specified in your solrconfig.xml. You could as well debug these problems by yourself, by using parameter debugQuery

Some of the tokens will be eliminated, and that's the reason why you didn't get proper results. To do a proper query, you need to enclose query with double quotes "