Question

我有一个像下面的设置和映射一样的索引;

{
  "settings":{
     "index":{
        "analysis":{
           "analyzer":{
              "analyzer_keyword":{
                 "tokenizer":"keyword",
                 "filter":"lowercase"
              }
           }
        }
     }
  },
  "mappings":{
     "product":{
        "properties":{
           "name":{
              "analyzer":"analyzer_keyword",
              "type":"string",
              "index": "not_analyzed"
           }
        }
     }
  }
}

我正在努力在name字段上实现通配符搜索。我的示例数据是这样的;

[
{"name": "SVF-123"},
{"name": "SVF-234"}
]

当我执行以下查询时;

http://localhost:9200/my_index/product/_search -d '
{
    "query": {
        "filtered" : {
            "query" : {
                "query_string" : {
                    "query": "*SVF-1*"
                }
            }
        }

    }
}'

返回SVF-123，SVF-234。我认为，它仍然是数据的标记。它必须仅返回SVF-123。

你能帮忙吗？

提前致谢

Answer 1

这里有一些问题。

首先，您说您不希望术语分析索引时间。然后，配置了一个分析器（用于搜索时间），生成不兼容的术语。（他们是小写的）

默认情况下，所有字词都在标准分析器的_all字段中结束。那是你最终搜索的地方。由于它标记为“ - ”，因此最终得到“* SVF”和“1 *”的OR。

尝试在_all和on name上执行术语构面以查看正在发生的事情。

这是一个可运行的Play和要点：https://www.found.no/play/gist/3e5fcb1b4c41cfc20226（https://gist.github.com/alexbrasetvik/3e5fcb1b4c41cfc20226）

您需要确保索引的字词与您搜索的内容兼容。您可能想要禁用_all，因为它可能会使正在发生的事情变得混乱。

#!/bin/bash

export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

# Create indexes

curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
    "settings": {
        "analysis": {
            "text": [
                "SVF-123",
                "SVF-234"
            ],
            "analyzer": {
                "analyzer_keyword": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": [
                        "lowercase"
                    ]
                }
            }
        }
    },
    "mappings": {
        "type": {
            "properties": {
                "name": {
                    "type": "string",
                    "index": "not_analyzed",
                    "analyzer": "analyzer_keyword"
                }
            }
        }
    }
}'


# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"name":"SVF-123"}
{"index":{"_index":"play","_type":"type"}}
{"name":"SVF-234"}
'

# Do searches

# See all the generated terms.
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "facets": {
        "name": {
            "terms": {
                "field": "name"
            }
        },
        "_all": {
            "terms": {
                "field": "_all"
            }
        }
    }
}
'

# Analyzed, so no match
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "match": {
            "name": {
                "query": "SVF-123"
            }
        }
    }
}
'

# Not analyzed according to `analyzer_keyword`, so matches. (Note: term, not match)
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "term": {
            "name": {
                "value": "SVF-123"
            }
        }
    }
}
'


curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "term": {
            "_all": {
                "value": "svf"
            }
        }
    }
}
'

Answer 2

我的解决方案冒险

我已经开始了我的案子，你可以在我的问题中看到。每当我更改了部分设置时，一部分开始工作，但另一部分停止工作。让我给出我的解决方案历史：

1。）我已将数据编入索引为默认值。这意味着，我的数据默认为analyzed。这会引起我的问题。例如;

当用户开始搜索 SVF-1 等关键字时，系统会运行此查询：

{
    "query": {
        "filtered" : {
            "query" : {
                "query_string" : {
                    "analyze_wildcard": true,
                    "query": "*SVF-1*"
                }
            }
        }

    }
}

和结果;

SVF-123
SVF-234

这是正常的，因为我的文档的name字段是analyzed。这会将查询拆分为令牌SVF和1，SVF会匹配我的文档，但1不匹配。我已经跳过了这种方式。我为我的字段创建了一个映射，使它们成为not_analyzed

{
  "mappings":{
     "product":{
        "properties":{
           "name":{
              "type":"string",
              "index": "not_analyzed"
           },
           "site":{
              "type":"string",
              "index": "not_analyzed"
           } 
        }
     }
  }
}

但我的问题还在继续。

2。）经过大量研究，我想尝试另一种方式。决定使用wildcard query。我的问题是;

{
    "query": {
        "wildcard" : {
            "name" : {
                "value" : *SVF-1*"
             }
          }
      },
            "filter":{
                    "term": {"site":"pro_en_GB"}
            }
    }
}

这个查询有效，但这里有一个问题。我的字段不再被分析，我正在进行通配符查询。区分大小写是一个问题。如果我搜索 svf-1 ，则不返回任何内容。因为，用户可以输入小写版本的查询。

3。）我已将文档结构更改为;

{
  "mappings":{
     "product":{
        "properties":{
           "name":{
              "type":"string",
              "index": "not_analyzed"
           },
           "nameLowerCase":{
              "type":"string",
              "index": "not_analyzed"
           }
           "site":{
              "type":"string",
              "index": "not_analyzed"
           } 
        }
     }
  }
}

我为name添加了一个名为nameLowerCase的字段。当我索引我的文档时，我正在设置我的文档;

{
    name: "SVF-123",
    nameLowerCase: "svf-123",
    site: "pro_en_GB"
}

在这里，我将查询关键字转换为小写，并对新的nameLowerCase索引进行搜索操作。并显示name字段。

我的查询的最终版本是;

{
    "query": {
        "wildcard" : {
            "nameLowerCase" : {
                "value" : "*svf-1*"
             }
          }
      },
            "filter":{
                    "term": {"site":"pro_en_GB"}
            }
    }
}

现在它有效。使用multi_field还有一种方法可以解决这个问题。我的查询包含破折号（ - ），并遇到一些问题。

非常感谢@Alex Brasetvik的详细解释和努力

Answer 3

添加Hüseyin答案，我们可以使用AND作为默认运算符。因此SVF和1 *将使用AND运算符连接，从而为我们提供正确的结果。

"query": {
    "filtered" : {
        "query" : {
            "query_string" : {
                "default_operator": "AND",
                "analyze_wildcard": true,
                "query": "*SVF-1*"
            }
        }
    }
}

Answer 4

@Viduranga Wijesooriya如你所述"default_operator" : "AND"将检查是否存在SVF和1但仍无法完全匹配，但是，这将以更合适的方式过滤结果，使用SVF和1的所有组合，并按相关性对结果进行排序，这将促进SVF-1向上发送

取出确切的结果

"settings": {
        "analysis": {
            "analyzer": {
                "analyzer_keyword": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": [
                        "lowercase"
                    ]
                }
            }
        }
    },
    "mappings": {
        "type": {
            "properties": {
                "name": {
                    "type": "string",
                    "analyzer": "analyzer_keyword"
                }
            }
        }
    }

，查询是

{
    "query": {
        "bool": {
            "must": [
               {
                    "query_string" : {
                        "fields": ["name"],
                        "query" : "*svf-1*",
                        "analyze_wildcard": true
                    }
               }
            ]
        }
    }
}

结果

{
   "took": 4,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "play",
            "_type": "type",
            "_id": "AVfXzn3oIKphDu1OoMtF",
            "_score": 1,
            "_source": {
               "name": "SVF-123"
            }
         }
      ]
   }
}

在not_analyzed字段上进行Elasticsearch通配符搜索

4 个答案: