我有一个弹性搜索字段,我使用ngram tokenizer进行索引。出乎意料的是,elasticsearch没有合并相邻的亮点。例如,对于搜索字词854511
,我得到以下重点
DA V50 v335 auf v331 J06A <mark>85</mark><mark>45</mark><mark>11</mark>
虽然我期待这个
DA V50 v335 auf v331 J06A <mark>854511</mark>
以下是我的分析仪:
ADDITIONAL_ANALYZERS = {
analyzer: {
ngram_analyzer: {
tokenizer: :ngram_tokenizer,
filter: 'lowercase'
}},
tokenizer: {
ngram_tokenizer: {type: :nGram,
min_gram: 2,
max_gram: 20,
token_chars: [ 'letter', 'digit', 'symbol', 'punctuation' ]
}}
}
settings analysis: ADDITIONAL_ANALYZERS do
mappings do
indexes :name, type: 'multi_field' do
indexes :name, type: :string, analyzer: :ngram_analyzer, term_vector: :with_positions_offsets
indexes :not_analyzed, type: :string, index: :not_analyzed
end
indexes :mdc, type: :string, index: :not_analyzed
indexes :description, type: :string, analyzer: :html_ngram_analyzer, term_vector: :with_positions_offsets
indexes :created_at, type: :date
end
end
答案 0 :(得分:1)
尝试使用plain
荧光笔。
如果您尝试以下查询:
{
"query": {
"match": {
"name": "854511"
}
},
"highlight": {
"fields": {
"name": {
"pre_tags": [
"<mark>"
],
"post_tags": [
"</mark>"
],
"fragment_size": 150,
"number_of_fragments": 1,
"type": "plain"
}
}
}
}
你得到了理想的结果:
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 0.1856931,
"_source": {
"name": "DA V50 v335 auf v331 J06A 854511"
},
"highlight": {
"name": [
"DA V50 v335 auf v331 J06A <mark>854511</mark>"
]
}
}
]