Solr FuzzyLookupFactory exactMatch区分大小写

时间:2016-12-25 10:39:12

标签: solr lucene autosuggest fuzzy-search search-suggestion

这可能是一个重复的问题,但无法找到与此相关的内容:

我已经为城市和地区列表实施了solr建议。我有用户FuzzyLookupFactory。我的架构如下所示:

<fieldType name="suggestTypeLc" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" " />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

synonym.txt用于将旧城市名称与新城市名称映射,例如Madras =&gt; Chennai,Saigon =&gt;胡志明市

我的建议定义如下:

  <searchComponent name="suggest" class="solr.SuggestComponent">
        <lst name="suggester">
              <str name="name">suggestions</str>
              <str name="lookupImpl">FuzzyLookupFactory</str>
              <str name="dictionaryImpl">DocumentDictionaryFactory</str>
              <str name="field">searchfield</str>
              <str name="weightField">searchscore</str>
              <str name="suggestAnalyzerFieldType">suggestTypeLc</str>
              <str name="buildOnStartup">false</str>
              <str name="buildOnCommit">false</str>
              <str name="storeDir">autosuggest_dict</str>
        </lst>
  </searchComponent>

我的请求处理程序如下所示:

  <requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
        <lst name="defaults">
                <str name="suggest">true</str>
                <str name="suggest.count">10</str>
                <str name="suggest.dictionary">suggestions</str>
                <str name="suggest.dictionary">results</str>
        </lst>
        <arr name="components">
                <str>suggest</str>
        </arr>
  </requestHandler>

现在问题是建议者首先显示完全匹配但是区分大小写。例如,

/suggest?suggest.q=mumbai(以小写“m”开头)

将在第4位给出确切的结果:

{
  "responseHeader":{
    "status":0,
    "QTime":19},
  "suggest":{
    "suggestions":{
      "mumbai":{
        "numFound":10,
        "suggestions":[{
            "term":"Mumbai Domestic Airport",
            "weight":11536},
          {
            "term":"Mumbai Chhatrapati Shivaji Intl Airport",
            "weight":11376},
          {
            "term":"Mumbai Pune Highway",
            "weight":2850},
          {
            "term":"Mumbai",
            "weight":2248},
.....

然而,调用/suggest?suggest.q=Mumbai(以大写字母“M”开头)

在第一名给出了确切的结果:

{
  "responseHeader":{
    "status":0,
    "QTime":16},
  "suggest":{
    "suggestions":{
      "Mumbai":{
        "numFound":10,
        "suggestions":[{
            "term":"Mumbai",
            "weight":2248},
          {
            "term":"Mumbai Domestic Airport",
            "weight":11536},
          {
            "term":"Mumbai Chhatrapati Shivaji Intl Airport",
            "weight":11376},
          {
            "term":"Mumbai Pune Highway",
            "weight":2850},
...

我在这里缺少什么?即使从小写“mumbai”作为查询调用孟买,也可以做什么来使孟买成为第一个结果。我认为区分大小写是由我生成的“suggestTypeLc”字段处理的。

1 个答案:

答案 0 :(得分:1)

FuzzyLookupFactory隐藏的配置参数为exactMatchFirst,其描述为:

  

如果为true,则首先返回默认的确切建议,即使它们是前缀或FST中的其他字符串具有更大的权重。

根据您的配置建议按searchscore字段排名(在您的配置中,它引用:<str name="weightField">searchscore</str>)。这就是为什么当您查询mumbai时,所有建议都按权重排序。

但是根据exactMatchFirst=true,尽管提供了加权机制,但您会在Mumbai之上(对于查询= Mumbai)。这实际上是exactMatchFirst影响排序的方式。

不幸的是,我没有找到调整你的建议者的选项,而不是完全摆脱weightField

尝试关闭字段加权或尝试其他查找实现,例如AnalyzingInfixLookupFactory