Question

我已将这样的文档添加到索引

POST /analyzer3/books
{
  "title": "The other day I went with my mom to the pool and had a lot of fun"
}

然后我进行这样的查询

GET /analyzer3/_analyze
{
  "analyzer": "english",
  "text": "\"The * day I went with my * to the\""
}

它成功返回了以前添加的文档。

我的想法是使用引号使查询变得准确，但也可以使用通配符替换任何单词。 Google具有此确切功能，您可以在其中搜索类似这样的查询，例如"I'm * the university"，它将返回包含诸如I'm studying in the university right now之类的文本的页面结果。

但是我想知道是否还有另一种方法。

我主要担心的是，这似乎不适用于日语和汉语等其他语言。我尝试了许多分析器和令牌生成器，但无济于事。

任何答案都值得赞赏。

Answer 1

在标记化字段上进行精确匹配并不是那么简单。如果有这样的要求，最好将字段另存为keyword。

此外，keyword数据类型支持通配符查询，可以帮助您进行通配符搜索。

因此，只需创建一个keyword类型的子字段。然后在其上使用通配符查询。

您的搜索查询如下所示：

GET /_search
{
    "query": {
        "wildcard" : { 
            "title.keyword" :  "The * day I went with my * to the" 
         }
    }
}

在上述查询中，假设title字段具有名为keyword的子字段，其数据类型为keyword。

有关通配符查询的更多信息，请参见here。

如果您仍然想对text数据类型进行精确搜索，请阅读this

Answer 2

Elasticsearch并没有开箱即用的Google搜索功能，但是您可以构建类似的内容。

让我们假设当某人引用搜索文本时，他们想要的是match phrase query。基本上删除\"并搜索其余的字符串作为短语。

PUT test/_doc/1
{
  "title": "The other day I went with my mom to the pool and had a lot of fun"
}

GET test/_search
{
  "query": {
    "match_phrase": {
      "title": "The other day I went with my mom to the pool and had a lot of fun"
    }
  }
}

对于*，它变得越来越有趣。您可以从中进行多个词组搜索并将其组合。示例：

GET test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "title": "The"
          }
        },
        {
          "match_phrase": {
            "title": "day I went with my"
          }
        },
        {
          "match_phrase": {
            "title": "to the"
          }
        }
      ]
    }
  }
}

或者您可以在词组搜索中使用slop。搜索查询中的所有术语都必须存在（除非被分词器删除或用作停用词），但是匹配的短语在短语中可以包含其他单词。在这里，我们可以将每个*替换为1个其他单词，因此总共减少了2个。如果要在每个*处使用多个单词，则需要选择一个更高的斜率：

GET test/_search
{
  "query": {
    "match_phrase": {
      "title": {
        "query": "The * day I went with my * to the",
        "slop": 2
      }
    }
  }
}

另一个选择可能是shingles，但这是一个更高级的概念，我现在将从基本知识开始。

如何在ElasticSearch中实现此类查询？

2 个答案: