Question

例如，我有一个内容为“FileV2UpdateRequest”的记录，并且基于我的分析器，它会将记录分成令牌：

filev
2
updaterequest

我希望能够在“query_string”查询中搜索filev2update*来查找它，但无论出于何种原因，*都不会尝试找到“updaterequest”的其余部分。

如果我输入查询filev2 update*，则会返回结果。

我有什么办法可以在不需要空间的地方工作吗？

我尝试将auto_generate_phrase_queries设置为true，但这也无法解决问题。看起来当你添加通配符符号时，它会将整个输入视为一个标记，而不是仅仅查看通配符所触及的标记。

如果我添加analyze_wildcard并将其设置为true，它会尝试将*放在查询中的每个标记上。 costv * 2 * add *

Answer 1

我认为您可以使用word_delimiter为内容编制索引来更改索引过滤器，Compound Word Token Filter

如果使用此过滤器，

FileV2UpdateRequest 将被分析为令牌：

{
    "tokens": [{
        "token": "File",
        "start_offset": 0,
        "end_offset": 4,
        "type": "word",
        "position": 1
    }, {
        "token": "V",
        "start_offset": 4,
        "end_offset": 5,
        "type": "word",
        "position": 2
    }, {
        "token": "2",
        "start_offset": 5,
        "end_offset": 6,
        "type": "word",
        "position": 3
    }, {
        "token": "Update",
        "start_offset": 6,
        "end_offset": 12,
        "type": "word",
        "position": 4
    }, {
        "token": "Request",
        "start_offset": 12,
        "end_offset": 19,
        "type": "word",
        "position": 5
    }]
}

对于搜索内容，您还需要使用 word_delimiter 作为过滤器，而不使用 wild_card 。

filev2update 将被分析为令牌：

{
    "tokens": [{
        "token": "file",
        "start_offset": 0,
        "end_offset": 4,
        "type": "word",
        "position": 1
    }, {
        "token": "V",
        "start_offset": 4,
        "end_offset": 5,
        "type": "word",
        "position": 2
    }, {
        "token": "2",
        "start_offset": 5,
        "end_offset": 6,
        "type": "word",
        "position": 3
    }, {
        "token": "update",
        "start_offset": 6,
        "end_offset": 12,
        "type": "word",
        "position": 4
    }]
}

带有多个标记的查询的ElasticSearch查询字符串查询通配符

1 个答案: