如何使用弹性搜索突出显示单词中的ngram标记

时间:2017-05-26 15:31:45

标签: elasticsearch highlight

我想强调的是匹配的ngrams,而不是整个单词。 例如:

term: "Wo"
highlight should be: "<em>Wo</em>nderfull world!"
currently it is: "<em>Wonderfull</em> world!"

映射是:

{
  "global_search_1495732922733" : {
    "mappings" : {
      "meeting" : {
        "properties" : {
        ...
          "name" : {
            "type" : "text",
            "analyzer" : "meeteor_index_analyzer",
            "search_analyzer" : "meeteor_search_term_analyzer"
          },
          ...
        }
      }
    }
  }
}
分析员是:

"analysis" : {
  "filter" : {
    "meeteor_stemmer" : {
      "name" : "english",
      "type" : "stemmer"
    },
    "meeteor_ngram" : {
      "type" : "nGram",
      "min_gram" : "2",
      "max_gram" : "15"
    }
  },
  "analyzer" : {
    "meeteor_search_term_analyzer" : {
      "filter" : [
        "lowercase",
        "asciifolding"
      ],
      "tokenizer" : "standard"
    },
    "meeteor_index_analyzer" : {
      "filter" : [
        "lowercase",
        "asciifolding",
        "meeteor_ngram"
      ],
      "tokenizer" : "standard"
    },
    "meeteor_project_id_analyzer" : {
      "tokenizer" : "standard"
    }
  }
},

具体例子:

curl -XGET 'localhost:9200/global_search/meeting/_search?pretty' -H 'Content-Type: application/json' -d'
{
    "query": {
        "match": {
            "name": "Me"
        }
    },
    "highlight":{
      "fields": {
        "name": {}
      }
    }
}
'

结果是:

 "...highlight" : {
          "name" : [
            "Sad <em>Meeting</em>"
          ]
        }

1 个答案:

答案 0 :(得分:2)

实现所需目标的正确方法是将 ngram 用作 tokenizer 而不是过滤器。你可以这样做:

"analysis" : {
  "filter" : {
    "meeteor_stemmer" : {
      "name" : "english",
      "type" : "stemmer"
    }
  },
  "tokenizer" : {
    "meeteor_ngram_tokenizer" : {
      "type" : "nGram",
      "min_gram" : "2",
      "max_gram" : "15"
    }
  },
  "analyzer" : {
    "meeteor_search_term_analyzer" : {
      "filter" : [
        "lowercase",
        "asciifolding"
      ],
      "tokenizer" : "standard"
    },
    "meeteor_index_analyzer" : {
      "filter" : [
        "lowercase",
        "asciifolding"
      ],
      "tokenizer" : "meeteor_ngram_tokenizer"
    },
    "meeteor_project_id_analyzer" : {
      "tokenizer" : "standard"
    }
  }
},

它将为你生成ngram的高亮显示:

 "...highlight" : {
          "name" : [
            "Sad <em>Me</em>eting"
          ]
        }