我正在使用elasticsearch _suggest端点来建议拼写更正("你的意思是")。已经出现的一个例子是
"与muclh"
我的搜索请求是:
{
"phrase": {
"phrase": {
"field": "summary",
"max_errors": 0.5,
"analyzer": "standard",
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
},
"gram_size": 1,
"real_word_error_likelihood": 0.95,
"direct_generator": [
{
"field": "summary",
"suggest_mode": "missing",
"min_word_len": 3,
"min_doc_freq": 5,
"max_edits": 1
}
]
},
"text": "gardeing with muclh"
},
"term": {
"term": {
"field": "summary",
"analyzer": "standard",
"suggest_mode": "missing",
"size": 3
},
"text": "gardeing with muclh"
}
}
并返回结果:
{
"term": [
{
"text": "gardeing",
"offset": 0,
"length": 8,
"options": [
{
"text": "gardening",
"score": 0.875,
"freq": 512
},
{
"text": "gardenia",
"score": 0.75,
"freq": 71
},
{
"text": "gardeninig",
"score": 0.75,
"freq": 1
}
]
},
{
"text": "with",
"offset": 9,
"length": 4,
"options": []
},
{
"text": "muclh",
"offset": 14,
"length": 5,
"options": [
{
"text": "mulch",
"score": 0.8,
"freq": 190
},
{
"text": "much",
"score": 0.75,
"freq": 527
},
{
"text": "muscle",
"score": 0.6,
"freq": 1
}
]
}
],
"phrase": [
{
"text": "gardeing with muclh",
"offset": 0,
"length": 19,
"options": [
{
"text": "gardening with much",
"highlighted": "<em>gardening</em> with <em>much</em>",
"score": 0.000007876507
},
{
"text": "gardening with mulch",
"highlighted": "<em>gardening</em> with <em>mulch</em>",
"score": 0.000005306385
},
{
"text": "gardening with muclh",
"highlighted": "<em>gardening</em> with muclh",
"score": 5.7017786e-7
}
]
}
]
}
问题是正确的版本是&#34;园艺与mulch&#34;但匹配这句话的回归与#34;园艺有很多&#34;。请注意,该术语表示&#34;覆盖&#34;高于&#34;多&#34;对于&#34; muclh&#34;,我认为是因为他们的得分变化高于遗漏或补充。
更新:我通过添加&#34; maxTermFrequency&#34;修复了这个特殊问题。 .2 - 但这似乎就像一个黑客。如果可能的话,我宁愿更聪明地解决它。
有没有办法用覆盖物做种植&#34;园艺&#34;是第一个建议,而不是&#34;园艺与很多&#34;没有诉诸MaxTermFrequency?