Question

示例数据：

PUT /test/test/1
{
    "text1":"cats meow",
    "text2":"12345",
    "text3":"toy"
}

PUT /test/test/2
{
    "text1":"dog bark",
    "text2":"98765",
    "text3":"toy"
}

一个示例查询：

GET /test/test/_search
{
    "size": 25,
    "query": {
        "multi_match" : {
            "fields" : [
                "text1", 
                "text2",
                "text3"
            ],
            "query" : "meow cats toy",
            "type" : "cross_fields"
        }
    }
}

首先返回cat hit，然后返回dog，这就是我想要的。

但当您查询cat toy时，猫与狗的相关性得分相同。我希望能够考虑该单词的前缀（可能还包括该字段中的其他几个单词），然后运行cross_fields。

所以，如果我搜索：

GET /test/test/_search
{
    "size": 25,
    "query": {
        "multi_match" : {
            "fields" : [
                "text1", 
                "text2",
                "text3"
            ],
            "query" : "cat toy",
            "type" : "phrase_prefix"
        }
    }
}

或

GET /test/test/_search
{
    "size": 25,
    "query": {
        "multi_match" : {
            "fields" : [
                "text1", 
                "text2",
                "text3"
            ],
            "query" : "meow cats",
            "type" : "phrase_prefix"
        }
    }
}

我应该得到猫/ ID 1，但我没有。

我发现使用cross_fields可以实现多词短语，而不是多重不完整的短语。并且phrase_prefix实现了不完整的短语，但没有多个不完整的短语......

筛选documentation实际上并没有帮助我发现如何将这两者结合起来。

Answer 1

是的，我必须使用分析仪...

在添加任何数据之前，分析器在创建索引时应用于字段。添加数据后，我找不到更简单的方法。

我找到的解决方案是将所有短语分解为每个单独的前缀，以便cross_fields能够做到这一点。您可以详细了解edge-ngram here的使用情况。

因此，只需搜索cross_field短语而不是cats，它现在将搜索：c，ca，cat和{{1}以及...之后的每个短语所以cats字段看起来像弹性：text1。

~~~

以下是使上述问题示例有效的步骤：

首先，您要创建并命名分析器。要了解过滤器值的含义，建议您查看this。

c ca cat cats m me meo meow

然后我将此分析仪连接到每个字段。我更改了PUT /test { "settings": { "number_of_shards": 1, "analysis": { "filter": { "autocomplete_filter": { "type": "edge_ngram", "min_gram": 1, "max_gram": 20 } }, "analyzer": { "autocomplete": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "autocomplete_filter" ] } } } } }以匹配我应用此字段的字段。

text1

我跑了PUT /test/_mapping/test { "test": { "properties": { "text1": { "type": "string", "analyzer": "autocomplete" } } } }以确保一切正常。

然后添加数据：

GET /test/_mapping

搜索！

POST /test/test/_bulk
{ "index": { "_id": 1 }}
{ "text1": "cats meow", "text2": "12345", "text3": "toy" }
{ "index": { "_id": 2 }}
{ "text1": "dog bark", "text2": "98765", "text3": "toy" }

返回：

{
    "size": 25,
    "query": {
        "multi_match" : {
            "fields" : [
                "text1", 
                "text2",
                "text3"
            ],
            "query" : "cat toy",
            "type" : "cross_fields"
        }
    }
}

当您搜索{ "took": 3, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 2, "max_score": 0.70778143, "hits": [ { "_index": "test", "_type": "test", "_id": "1", "_score": 0.70778143, "_source": { "text1": "cats meow", "text2": "12345", "text3": "toy" } }, { "_index": "test", "_type": "test", "_id": "2", "_score": 0.1278426, "_source": { "text1": "dog bark", "text2": "98765", "text3": "toy" } } ] } }时，这会在两者之间产生对比度，而在得分相同之前。但是现在，cat toy命中得分更高。这是通过考虑每个短语的每个前缀（在这个案例/短语中最多20个字符），然后查看数据与cat的相关程度来实现的。

每个领域中具有多个完整和不完整短语的跨领域搜索

1 个答案: