如何改善弹性搜索提示结果

时间:2015-03-03 10:14:26

标签: elasticsearch

我正在使用elasticsearch _suggest端点来建议拼写更正("你的意思是")。已经出现的一个例子是

"与muclh"

我的搜索请求是:

{
  "phrase": {
    "phrase": {
      "field": "summary",
      "max_errors": 0.5,
      "analyzer": "standard",
      "highlight": {
        "pre_tag": "<em>",
        "post_tag": "</em>"
      },
      "gram_size": 1,
      "real_word_error_likelihood": 0.95,
      "direct_generator": [
        {
          "field": "summary",
          "suggest_mode": "missing",
          "min_word_len": 3,
          "min_doc_freq": 5,
          "max_edits": 1
        }
      ]
    },
    "text": "gardeing with muclh"
  },
  "term": {
    "term": {
      "field": "summary",
      "analyzer": "standard",
      "suggest_mode": "missing",
      "size": 3
    },
    "text": "gardeing with muclh"
  }
}

并返回结果:

{
  "term": [
    {
      "text": "gardeing",
      "offset": 0,
      "length": 8,
      "options": [
        {
          "text": "gardening",
          "score": 0.875,
          "freq": 512
        },
        {
          "text": "gardenia",
          "score": 0.75,
          "freq": 71
        },
        {
          "text": "gardeninig",
          "score": 0.75,
          "freq": 1
        }
      ]
    },
    {
      "text": "with",
      "offset": 9,
      "length": 4,
      "options": []
    },
    {
      "text": "muclh",
      "offset": 14,
      "length": 5,
      "options": [
        {
          "text": "mulch",
          "score": 0.8,
          "freq": 190
        },
        {
          "text": "much",
          "score": 0.75,
          "freq": 527
        },
        {
          "text": "muscle",
          "score": 0.6,
          "freq": 1
        }
      ]
    }
  ],
  "phrase": [
    {
      "text": "gardeing with muclh",
      "offset": 0,
      "length": 19,
      "options": [
        {
          "text": "gardening with much",
          "highlighted": "<em>gardening</em> with <em>much</em>",
          "score": 0.000007876507
        },
        {
          "text": "gardening with mulch",
          "highlighted": "<em>gardening</em> with <em>mulch</em>",
          "score": 0.000005306385
        },
        {
          "text": "gardening with muclh",
          "highlighted": "<em>gardening</em> with muclh",
          "score": 5.7017786e-7
        }
      ]
    }
  ]
}

问题是正确的版本是&#34;园艺与mulch&#34;但匹配这句话的回归与#34;园艺有很多&#34;。请注意,该术语表示&#34;覆盖&#34;高于&#34;多&#34;对于&#34; muclh&#34;,我认为是因为他们的得分变化高于遗漏或补充。

更新:我通过添加&#34; maxTermFrequency&#34;修复了这个特殊问题。 .2 - 但这似乎就像一个黑客。如果可能的话,我宁愿更聪明地解决它。

有没有办法用覆盖物做种植&#34;园艺&#34;是第一个建议,而不是&#34;园艺与很多&#34;没有诉诸MaxTermFrequency?

0 个答案:

没有答案