在Elasticsearch 7.6中用空格突出显示单词

时间:2020-05-01 10:42:46

标签: elasticsearch elasticsearch-dsl elasticsearch-query

我想使用Elasticsearch高亮获取文本内找到的匹配关键字。 这是我的设置/映射

{
  "settings": {
    "analysis": {
      "char_filter": {
        "my_char_filter": {
          "type": "mapping",
          "mappings": [
            "- => _",
          ]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ],
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "my_analyzer"
        },
        "description": {
          "type": "text",
          "analyzer": "my_analyzer",
          "fielddata": True
        }
      }
  }
}

我正在使用char_filter来搜索和加高混音的单词。 这是我的文档示例:

{
    "_index": "test_tokenizer",
    "_type": "_doc",
    "_id": "DbBIxXEBL7VGAl98vIRl",
    "_score": 1.0,
    "_source": {
        "title": "Best places: New Mexico and Sedro-Woolley",
        "description": "This is an example text containing some cities like New York, Toronto, Rome and many other. So, there are also Milton-Freewater and Las Vegas!"
    }
}

这是我使用的查询

{
    "query": {
        "query_string" : {
            "query" : "\"New York\" OR \"Rome\" OR \"Milton-Freewater\"",
            "default_field": "description"
        }
    },
    "highlight" : {
        "pre_tags" : ["<key>"],
        "post_tags" : ["</key>"],
        "fields" : {
            "description" : {
                "number_of_fragments" : 0
            }
        }
    }
}

这是我的输出

...
"hits": [
    {
        "_index": "test_tokenizer",
        "_type": "_doc",
        "_id": "GrDNz3EBL7VGAl98EITg",
        "_score": 0.72928625,
        "_source": {
            "title": "Best places: New Mexico and Sedro-Woolley",
            "description": "This is an example text containing some cities like New York, Toronto, Rome and many other. So, there are also Milton-Freewater and Las Vegas!"
        },
        "highlight": {
            "description": [
                "This is an example text containing some cities like <key>New</key> <key>York</key>, Toronto, <key>Rome</key> and many other. So, there are also <key>Milton-Freewater</key> and Las Vegas!"
            ]
        }
    }
]
...

罗马 Milton-Freewater 正确突出显示。 纽约不是

如何拥有<key>New York</key>而不是<key>New</key><key>York</key>

1 个答案:

答案 0 :(得分:1)

对此有一个open PR,但我建议采用以下临时解决方案:

  1. 添加一个term_vector设置
PUT test_tokenizer
{
  "settings": {
    "analysis": {
      "char_filter": {
        "my_char_filter": {
          "type": "mapping",
          "mappings": [
            "- => _"
          ]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ],
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "my_analyzer"
      },
      "description": {
        "type": "text",
        "analyzer": "my_analyzer",
        "term_vector": "with_positions_offsets",
        "fielddata": true
      }
    }
  }
}
  1. 同步文档
POST test_tokenizer/_doc
{"title":"Best places: New Mexico and Sedro-Woolley","description":"This is an example text containing some cities like New York, Toronto, Rome and many other. So, there are also Milton-Freewater and Las Vegas!"}
  1. 将您的query_string转换为match_phrases内的一堆bool-should highlight_query ,并使用type: fvh
GET test_tokenizer/_search
{
  "query": {
    "query_string": {
      "query": "'New York' OR 'Rome' OR 'Milton-Freewater'",
      "default_field": "description"
    }
  },
  "highlight": {
    "pre_tags": [
      "<key>"
    ],
    "post_tags": [
      "</key>"
    ],
    "fields": {
      "description": {
        "highlight_query": {
          "bool": {
            "should": [
              {
                "match_phrase": {
                  "description": "New York"
                }
              },
              {
                "match_phrase": {
                  "description": "Rome"
                }
              },
              {
                "match_phrase": {
                  "description": "Milton-Freewater"
                }
              }
            ]
          }
        },
        "type": "fvh",
        "number_of_fragments": 0
      }
    }
  }
}

屈服

{
  "highlight":{
    "description":[
      "This is an example text containing some cities like <key>New York</key>, Toronto, <key>Rome</key> and many other. So, there are also <key>Milton-Freewater</key> and Las Vegas!"
    ]
  }
}