Question

假设我有一个索引，并且我使用以下语句添加了一些文档：

POST test/item/_bulk
{"id": 1, "text": "one two"}
{"id": 2, "text": "one two three"}
{"id": 3, "text": "three one two"}
{"id": 4, "text": "three one two four"}
{"id": 5, "text": "one two|"}
{"id": 6, "text": "|one two"}
{"id": 7, "text": "|one two|"}
{"id": 8, "text": "one|two"}
{"id": 9, "text": "one| two"}
{"id": 10, "text": "one |two"}
{"id": 11, "text": "one | two"}

我想要这个搜索：

GET test/item/_search
{
    "query": 
    {
        "query_string": 
        {
            "query": "\"one two\"",
            "fields": ["text"],
            "analyze_wildcard": "true",
            "allow_leading_wildcard": "true",
            "default_operator": "AND"
        }
    }
}

返回文件 1-7 。

我在文档和查询上尝试了各种分析器和标记器（std，空格等），但它们都没有给我想要的结果。

例如，std分析器返回所有文档，而空白分析器仅返回1-4。

是否有分析器/标记器/参数将返回想要的结果？

注意：为了清楚起见，我的数据包含短字符串和非常长字符串，没有共同的特征。我给出的单词（一，二，三，四）和符号（|）只是为了方便起见，可以替换为任何其他单词和非单词字符。

Answer 1

你应该使用范围查询。官方文档： https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html

Answer 2

我认为您应该尝试使用Pattern分析器来分析您的数据。它允许您指定RegEx以定义用于拆分文本的模式。

您应该创建一个自定义分析器并定义一个模式来标记您的数据。

https://www.elastic.co/guide/en/elasticsearch/guide/master/custom-analyzers.html

Answer 3

对不起，我昨天不理解你。存在一个解决方案，但它是否优化，我不确定：首先，您应该创建动态模板，并为您的字段设置not_analyzed模式：

curl -XPOST 'localhost:9200/_template/template1' -d '
{
    "template":"test_*",
    "mappings": {
        "item": {
            "dynamic_templates": [
                {
                    "strings": {
                        "mapping": { 
                            "index": "not_analyzed",
                            "type": "string"
                        },
                        "match_mapping_type": "string"
                    }
                }
            ]
        }
    },
    "aliases": {}
}'

然后我插入了下一行：

curl -XPOST localhost:9200/test_1/item/1 -d '{ "text": "one two"}'
curl -XPOST localhost:9200/test_1/item/2 -d '{ "text": "one two three"}'
curl -XPOST localhost:9200/test_1/item/3 -d '{ "text": "three one two"}'
curl -XPOST localhost:9200/test_1/item/4 -d '{ "text": "three one two    four"}'
curl -XPOST localhost:9200/test_1/item/5 -d '{ "text": "one two|"}'
curl -XPOST localhost:9200/test_1/item/6 -d '{ "text": "|one two"}'
curl -XPOST localhost:9200/test_1/item/7 -d '{ "text": "|one two|"}'
curl -XPOST localhost:9200/test_1/item/8 -d '{ "text": "one|two"}'
curl -XPOST localhost:9200/test_1/item/9 -d '{ "text": "one| two"}'
curl -XPOST localhost:9200/test_1/item/10 -d '{ "text": "one |two"}'

使用通配符查询，您可以返回必要的行：

curl localhost:9200/test_1/_search -d '
{
"query": {
    "match" : {
        "test" : "one two"
    }
}
}'

此查询返回7行：

nugusbayevkk@mediator:/data/databases/elasticsearch-5.2.2/bin$ curl localhost:9200/test_1/_search?filter_path=hits.total -d '
{
"query": {
    "wildcard" : {
        "text" : "*one two*"
    }
}
}'
{"hits":{"total":7}}

？filter_path - 让我们显示一些字段，在这种情况下，它会显示已存在的总行数。

Elasticsearch query_string - 完全短语问题

3 个答案: