Solr - 结果在单词中间停止

时间:2012-08-01 08:33:40

标签: solr wildcard

不太确定如何说出这个标题。基本上,当我搜索'anim'时它会找到'动物',但是当我搜索'anima'时它找不到任何东西。然后,如果我搜索“动物”,它会再次找到“动物”......

有没有人有任何想法为什么它可能不适用于'anima'?这似乎发生在大多数单词中 - 但是在不同的字符中 - 例如'eleph'和'elephan'很好 - 但'elepha'不会返回任何东西。

以下是查询和结果:

查询1(好的)

/ solr的/选择FQ =类型:标签&安培; Q =名:动画

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params">
<str name="fq">type:tag</str>
<str name="q">name:anim</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<int name="id">1</int>
<str name="name">Animals</str>
<arr name="name_auto">
<str>Animals</str>
<str>Animals</str>
</arr>
<date name="timestamp">2012-08-01T08:16:38.789Z</date>
<str name="type">tag</str>
<str name="unique_id">tag_1</str>
</doc>
</result>
</response>

查询2(不行)

/ solr的/选择FQ =类型:标签&安培; Q =名:灵魂

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params">
<str name="fq">type:tag</str>
<str name="q">name:anima</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
</response>

查询3(好的)

/ solr的/选择FQ =类型:标签&安培; Q =名称:动物

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="fq">type:tag</str>
<str name="q">name:animal</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<int name="id">1</int>
<str name="name">Animals</str>
<arr name="name_auto">
<str>Animals</str>
<str>Animals</str>
</arr>
<date name="timestamp">2012-08-01T08:16:38.789Z</date>
<str name="type">tag</str>
<str name="unique_id">tag_1</str>
</doc>
</result>
</response>

编辑1:

字段定义

 <field name="name" type="text" indexed="true" stored="true" required="true" />

fieldType:

 <fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

编辑2:

通过分析器传递字符串:

1 个答案:

答案 0 :(得分:1)

安萨里是对的,问题是由于阻止。您发布的Solr架构证明了它,因为您正在使用PorterStemFilterFactory。如果要搜索部分单词,可以尝试使用通配符查询,具体取决于您使用的查询解析器。如果你使用的是SOlr 3.x,它们可能太慢了,而使用Solr 4.x时,它已经得到了很大的改进。您可能想要EdgeNGrams,以便anima也匹配animals