用于前缀搜索的Solr模式,howto?

时间:2012-01-11 15:53:25

标签: php solr lucene full-text-search

我从stackoverflow中读了很多问题,但没有找到答案,如何进行Solr前缀搜索。例如,我有文字:“solr文档是不可读的”,我需要找到这样的东西:“solr docu *”,“document unread *”,“unreadable is so *”,但不是“un * so *”,我做了这样的事情:

<fieldType name="prefix_search" class="solr.TextField">
  <analyzer>
    <tokenizer class="solr.LowerCaseTokenizerFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="30" side="front"/>
  </analyzer>
</fieldType>

但有时会返回意外结果,并且还可以使用“un * so *”查询。也许PHP SolrClient有问题?谢谢你的回复!

1 个答案:

答案 0 :(得分:1)

ReversedWildcardFilterFactory正是您想要的,然后可以使用curl轻松测试,如下所示:

curl 'http://example.com:8080/solr/select?q=prefix_search:un*+AND+prefix_search:so*'

<!-- Just like text_general except it reverses the characters of
     each token, to enable more efficient leading wildcard queries. -->
<fieldType name="text_general_rev" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
       maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>