弹性搜索边缘图未正确搜索

时间:2021-03-22 21:24:12

标签: ruby-on-rails elasticsearch

显示在这里: Query hits

我搜索“嘿”,检索到的记录之一是“你好”。

另一个例子是这样的: Query hits

我再次搜索“红外线”,并显示了一条包含以下内容的记录:“This is a message at index: 1”。

这是索引的设置:

settings analysis: {
    filter: {
      edge_ngram_filter: {
        type: "edge_ngram",
        min_gram: "2",
        max_gram: "20",
      }
    },
    analyzer: {
      edge_ngram_analyzer: {
        type: "custom",
        tokenizer: "standard",
        filter: ["lowercase", "edge_ngram_filter"]
      }
    }
  } do
    mappings dynamic: true do
      indexes :content, type: :text, analyzer: "edge_ngram_analyzer"
      # indexes :chat_id, type: :long
    end
  end

1 个答案:

答案 0 :(得分:0)

根据您为 hey 生成的索引映射令牌将

GET /_analyze

{
  "tokens": [
    {
      "token": "he",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "hey",
      "start_offset": 0,
      "end_offset": 3,
      "type": "word",
      "position": 1
    }
  ]
}

hello 生成的令牌将

GET /_analyze

{
  "tokens": [
    {
      "token": "he",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "hel",
      "start_offset": 0,
      "end_offset": 3,
      "type": "word",
      "position": 1
    },
    {
      "token": "hell",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 2
    },
    {
      "token": "hello",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 3
    }
  ]
}

由于上面两个都有 he 标记,所以如果你搜索 hey,两个文档都会匹配


修改你的索引映射为

{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "tokenizer": "my_tokenizer"
                }
            },
            "tokenizer": {
                "my_tokenizer": {
                    "type": "edge_ngram",
                    "min_gram": 3,            // note this
                    "max_gram": 10,
                    "token_chars": [
                        "letter",
                        "digit"
                    ]
                }
            }
        },
        "max_ngram_diff": 10
    },
    "mappings": {
        "properties": {
            "content": {
                "type": "text",
                "analyzer": "my_analyzer"
            }
        }
    }
}

现在使用analyze API

GET /_analyze

{
  "analyzer" : "my_analyzer",
  "text" : "hey"
}

令牌将

{
  "tokens": [
    {
      "token": "hey",
      "start_offset": 0,
      "end_offset": 3,
      "type": "word",
      "position": 0
    }
  ]
}

索引数据:

{
  "content": "hey"
}
{
  "content": "hello"
}

搜索查询:

{
  "query":{
    "match":{
      "content":"hey"
    }
  }
}

搜索结果:

"hits": [
      {
        "_index": "66754045",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.8713851,
        "_source": {
          "content": "hey"
        }
      }
    ]