突出显示产生了奇怪的结果

时间:2017-12-14 10:08:28

标签: elasticsearch highlight

我正在使用elasticsearch,我的亮点并不能满足我的期望。我的设置如下:

PUT my_index
{
  "settings": {
       "analysis": {
            "analyzer": {
                 "my_analyzer": {
                      "tokenizer": "my_tokenizer",
                      "filter": {
                           'lowercase','asciifolding'
                      }
                 }
            },
            "tokenizer": {
                 "my_tokenizer": {
                      "type": "ngram",
                      "min_gram": 2,
                      "max_gram": 25,
                      "token_chars": [
                           "letter",
                           "digit"
                      ]
                 }
            }
       }
  }
}

我在我的索引中放了一些产品

PUT index/product/1
{
 "name" : "Kit Guirlande Guinguette 50m Transparent",
 "field2": "foo"
}

PUT index/product/2
{
 "name": "Guirlande Guinguette Blanc 20 Bulbes 10M",
  "field2": "foo"
}

namefield2的映射:

"name_product": {
    "type": "text",
    "fields": {
        "keyword": {
            "type": "keyword",
            "ignore_above": 256
        },
        "search": {
            "type": "text",
            "analyzer": "my_analyzer",
            "search_analyzer": "standard"
        }
    },
    "analyzer": "my_analyzer"
},
"fields2": {
    "type": "text",
    "fields": {
        "keyword": {
            "type": "keyword",
            "ignore_above": 256
        }
     },
     "analyzer": "my_analyzer"
},

我正在做研究:

GET index/product/_search
{
 "query":{
      "multi_match": {
           "query" : "guirlande gui"
           "fields":[
                'name','field2'
           ]
          "minimum_should_match" : "100%"
      }
 }
 "highlight" : {
      "fields":{
          "name.search" : {
               'highlight_query':{
                    'match':{
                         'query'=>"guirlande gui"
                    }
               }
          }
      }
 }
}

回应:

{
 "hits": {
  "total": 2,
   "hits": [
         {
               "_index":"index",
               "_type": "product",
               "_id": "1",
               "_source": {
                     "name": "Guirlande Guinguette Blanc 20 Bulbes 10M"
                },
               "highlight": {
                     "name.search": [
                           " <em>Guirlande<em> Guinguette Blanc 20 Bulbes 10M"
                     ]
               }
         },

         {
               "_index": "index",
               "_type": "product",
               "_id": "2",
               "_source": {
                     "name": "Kit Guirlande Guinguette 30m Blanche"
                },
               "highlight": {
                     "name.search": [
                           " Kit Guirlande Guinguette 30m Blanche"
                     ]
               }
         }
   ]
 }
}

但是对于第二次亮相,我希望" Kit <em>Guirlande Gui</em>nguette 30m Blanche"。我认为当匹配部分不在初学但不能弄清楚原因时,我会遇到这种问题。

编辑: 我也尝试将突出显示的类型更改为“统一”,但它更好但仍然不是我想要的。它给了我:

{
 "hits": {
  "total": 2,
   "hits": [
         {
               "_index":"index",
               "_type": "product",
               "_id": "1",
               "_source": {
                     "name": "Guirlande Guinguette Blanc 20 Bulbes 10M"
                },
               "highlight": {
                     "name": [
                           " <em>Guirlande Gui</em>nguette Blanc 20 Bulbes 10M"
                     ]
               }
         },

         {
               "_index": "index",
               "_type": "product",
               "_id": "2",
               "_source": {
                     "name": "Kit Guirlande Guinguette 30m Blanche"
                },
               "highlight": {
                     "name": [
                           " Kit<span class="highlight"> G</span><span class="highlight">u</span><span class="highlight">i</span><span class="highlight">r</span><span class="highlight">l</span><span class="highlight">a</span><span class="highlight">n</span><span class="highlight">d</span><span class="highlight">e</span><span class="highlight"> </span><span class="highlight">G</span><span class="highlight">u</span><span class="highlight">i</span>n<span class="highlight">gu</span>ett<span class="highlight">e </span>30m B<span class="highlight">la</span><span class="highlight">n</span>che"
                     ]
               }
         }
   ]
 }
 }

所以它不是真的可读,所以我认为一张图片可以帮助: enter image description here

我们可以看到我有正确的想法,但我也有很多不需要的信息,如“blanche”中的“lan”和“e”或“Guinguette”的第二个“gu”

映射:

Mapping

分析器:

analyzer

搜索:

search

2 个答案:

答案 0 :(得分:0)

在我看来,这就是你的分析仪应该是什么样子,并且你也在查询:

  • 它应该有一个lowercase过滤器
  • 它应该有一个名为search的第二个子字段(或您选择的名称),它应该有一个不同的搜索分析器和相同的索引分析器
  • search子字段应该是突出显示部分中使用的字段,具有不同的highlight_query
DELETE my_index
PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer",
          "filter": ["lowercase"]
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "ngram",
          "min_gram": 2,
          "max_gram": 25,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  },
  "mappings": {
    "product": {
      "properties": {
        "name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            },
            "search": {
              "type": "text",
              "search_analyzer": "standard",
              "analyzer": "my_analyzer"
            }
          },
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}

PUT my_index/product/1
{
 "name" : "Kit Guirlande Guinguette 50m Transparent",
 "field2": "foo"
}

PUT my_index/product/2
{
 "name": "Guirlande Guinguette Blanc 20 Bulbes 10M",
  "field2": "foo"
}

GET my_index/product/_search
{
  "query": {
    "multi_match": {
      "query": "Guirlande Gui",
      "fields": [
        "name",
        "field2"
      ],
      "minimum_should_match": "100%"
    }
  },
  "highlight": {
    "fields": {
      "name.search": {
        "highlight_query": {
          "match": {
            "name.search": {
              "query": "Guirlande Gui"
            }
          }
        }
      }
    }
  }
}

答案 1 :(得分:0)

所以它现在有效。这是我的最终配置:

{{1}}