Question

将索引book编入索引：

curl -X PUT localhost:9200/books/book/1 -d '{
    "title": "All Quiet on the Western Front",
    "author": "Erich Maria Remarque",
    "year": 1929,
}'

我正在尝试使用official docs的代码实现词组推荐器。

所以我试过了;

curl -XPOST 'localhost:9200/books/_search' -d '{
  "suggest" : {
    "text" : "al quet",
    "simple_phrase" : {
      "phrase" : {
        "analyzer" : "body",
        "field" : "bigram",
        "size" : 1,
        "real_word_error_likelihood" : 0.95,
        "max_errors" : 0.5,
        "gram_size" : 2,
        "direct_generator" : [ {
          "field" : "title",
          "suggest_mode" : "always",
          "min_word_length" : 1
        } ],
        "highlight": {
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  }
}'

我希望从al quet更改为all quiet。

但是我收到以下错误：

  "error" : {
    "root_cause" : [ {
      "type" : "illegal_argument_exception",
      "reason" : "Analyzer [body] doesn't exists"

如果我将"analyzer" : "body"更改为"analyzer" : "title"，我会收到同样的错误，但是title：

  "error" : {
    "root_cause" : [ {
      "type" : "illegal_argument_exception",
      "reason" : "Analyzer [title] doesn't exists"

如果我将"analyzer" : "body"更改为"analyzer" : "default"，则该行不会显示错误，但会在下一行显示错误。 "field" : "bigram",

  "error" : {
     "root_cause" : [ {
       "type" : "illegal_argument_exception",
       "reason" : "No mapping found for field [bigram]"

实现这项工作的唯一方法是添加："analyzer" : "default",和"field" : "title",：

curl -XPOST 'localhost:9200/books/_search?pretty=true' -d '{
  "suggest" : {
    "text" : "al quet",
    "simple_phrase" : {
      "phrase" : {
        "analyzer" : "default",
        "field" : "title",
        "size" : 1,
        "real_word_error_likelihood" : 0.95,
        "max_errors" : 0.5,
        "gram_size" : 2,
        "direct_generator" : [ {
          "field" : "title",
          "suggest_mode" : "always",
          "min_word_length" : 1
        } ],
        "highlight": {
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  }
}'

有了这个，我得到了这个输出：

 "suggest" : {
    "simple_phrase" : [ {
      "text" : "al quet",
      "offset" : 0,
      "length" : 7,
      "options" : [ {
        "text" : "al quiet",
        "highlighted" : "al <em>quiet</em>",
        "score" : 0.09049256
      } ]
    } ]
  }

正如你可以看到它正在纠正quiet而不是al，我所有其他的尝试都是相同的，它只会纠正一个单词。

如何创建一个成功的短语建议，在示例中输入al quet并返回all quiet？

Answer 1

您收到第一个错误，因为索引中没有analyzer名为 body 且与 title 相同

第二个错误是由于缺少字段 bigram ，索引中只有三个字段，即 title ， author 和年

根据您当前的设置，要使suggester正常工作，您需要为max_errors提供高价值。从文档中，max_errors是

最多被认为是最高百分比的术语拼写错误以形成纠正。这个方法接受一个浮点值在[0..1]范围内，作为实际查询的一部分术语或数字> = 1作为查询术语的绝对数量。该 default设置为1.0，对应于只有更正的最多返回1个错误拼写的术语。请注意，也设置此项高可以对绩效产生负面影响。像1或2这样的低值是建议否则花在建议呼叫上的时间可能会超过花在查询执行上的时间。

所以这应该给你想要的输出。

{
  "suggest": {
    "text": "al quet",
    "simple_phrase": {
      "phrase": {
        "analyzer": "default",
        "field": "title",
        "size": 1,
        "real_word_error_likelihood": 0.95,
        "max_errors": 0.9,  <--- increase this value
        "gram_size": 2,
        "direct_generator": [
          {
            "field": "title",
            "suggest_mode": "always",
            "min_word_length": 1
          }
        ],
        "highlight": {
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  },
  "size": 0
}

您可能希望shingles用于短语，collate只能使用索引中的结果。我已经给出了this question的详细答案，这可能有所帮助。

Elasticsearch短语建议未按预期工作，只能提供一个好的修复

1 个答案: