ElasticSearch:如何搜索与对象数组无关的不同字段

时间:2015-06-17 18:53:47

标签: elasticsearch

我想搜索与对象数组无关的不同字段。我无法了解如何。

给出以下映射和数据输入:我想让用户能够以任意组合搜索所有可能的字段。用户将使用带有关键字输入的表单,排除关键字输入,日期范围和多选下拉列表。这个查询是什么样的?我在数据条目下面包含了几个失败的查询和过滤器。

映射

{
    "plants" : {
        "properties" : {
            "name" : {"type" : "string"},
            "description" : {"type" : "string"},
            "planting" : {"type" : "string"},
            "maintenance" : {"type" : "string"},
            "type" : {"type" : "integer"},
            "petals" : {
                "properties" : {
                    "color" : {"type" : "string"}
                }
            },
            "species" : {
                "properties" : {
                    "name" : {"type" : "string"},
                    "subspecies" : {
                        "properties" : {
                            "name" : {"type" : "string"}
                        }
                    }
                }
            },
            "pests" : {
                "properties" : {
                    "pest" : {"type" : "string"}
                }
            },
            "diseases" : {
                "properties" : {
                    "disease" : {"type" : "string"}
                }
            }
        }
    }
}

数据输入:Rose

{
    "name" : "Rose",
    "description" : "A few paragraphs of text",
    "planting" : "A few paragraphs of text",
    "maintenance" : "A few paragraphs of text",
    "type" : "Perennial",
    "petals" : [
        {"color" : "Red"},
        {"color" : "White"},
        {"color" : "Yellow"},
        {"color" : "Pink"},
        {"color" : "Orange"},
        {"color" : "Purple"}
    ],
    "species" : [
        {
            "name" : "Hulthemia",
            "description" : "A few paragraphs of text",
            "subspecies" : []
        },
        {
            "name" : "Hesperrhodos",
            "description" : "A few paragraphs of text",
            "subspecies" : []
        },
        {
            "name" : "Platyrhodon",
            "description" : "A few paragraphs of text",
            "subspecies" : []
        },
        {
            "name" : "Rosa",
            "description" : "A few paragraphs of text",
            "subspecies" : [
                {"name" : "Banksianae"},
                {"name" : "Bracteatae"},
                {"name" : "Caninae"},
                {"name" : "Carolinae"},
                {"name" : "Chinensis"},
                {"name" : "Gallicanae"},
                {"name" : "Gymnocarpae"},
                {"name" : "Laevigatae"},
                {"name" : "Pimpinellifoliae"},
                {"name" : "Cinnamomeae"},
                {"name" : "Synstylae"}
            ]
        }
    ],
    "pests" : [],
    "diseases" : []
}

查询

例如,我在以下查询中取得了成功,但对于100k到10M数据条目(不是鲜花和许多字段)的大型数据集,它并不准确。我正在搜索具有多个精确值匹配的多个字段,同时希望每个条目具有相关性分数。当我想要花“petal.color”是“紫色”,“粉红色”和/或“白色”以及搜索另外两个像“花瓣”这样的列表的字段时,“minimum_should_match”的选项没有意义。像“类型”这样的字符串。我可以将“minimum_should_match”设置为等于2,但是带有多个“petal.color”的花将满足该要求,我将获得不是“常年”或“年度”的“类型”,例如“双年展”< / strong>即可。我查看过滤器并将其作为我的下一个示例。

{
    "query" : {
        "bool" : {
            "must" : [
                {
                    "multi_match":{
                        "query":"disease resistant",
                        "type":"cross_fields",
                        "fields":[
                            "description",
                            "planting",
                            "maintenance",
                            "name"
                        ],
                        "tie_breaker":0.3
                    }
                }
            ],
            "must_not" : [
                {
                    "multi_match":{
                        "query":"lavender",
                        "type":"cross_fields",
                        "fields":[
                            "description",
                            "planting",
                            "maintenance",
                            "name"
                        ],
                        "tie_breaker":0.3
                    }
                }
            ],
            "should" : [
                {"match" : {"type" : "Perennial"}},
                {"match" : {"type" : "Annual"}},
                {"match" : {"petals.color" : "purple"}},
                {"match" : {"petals.color" : "pink"}},
                {"match" : {"petals.color" : "white"}}
            ]
        }
    }
}

使用条款查询

以下是尝试使用“条款”。我不确定为什么它不起作用。

{
    "query" : {
        "bool" : {
            "must" : [
                {
                    "multi_match":{
                        "query":"disease resistant",
                        "type":"cross_fields",
                        "fields":[
                            "description",
                            "planting",
                            "maintenance",
                            "name"
                        ],
                        "tie_breaker":0.3
                    }
                },
                {
                    "terms" : {
                        "type" : ["Perennial","Annual"],
                        "minimum_should_match" : 1
                    }
                },
                {
                    "terms" : {
                        "petals.color" : ["purple","pink","white"],
                        "minimum_should_match" : 1
                    }
                }
            ],
            "must_not" : [
                {
                    "multi_match":{
                        "query":"lavender",
                        "type":"cross_fields",
                        "fields":[
                            "description",
                            "planting",
                            "maintenance",
                            "name"
                        ],
                        "tie_breaker":0.3
                    }
                }
            ],
            "should" : [

            ]
        }
    }
}

使用查询/过滤器查询

以下是尝试将查询和过滤器组合使用混合和/或过滤器。 我觉得问题出现在“或”“花瓣颜色”中,其中“花瓣颜色”是一个颜色列表,而不是一个确切的值。

另一个选项是花瓣的排列列表。颜色解决“或”问题(即紫色+粉红色+白色,紫色+粉红色,紫色+白色,粉红色+白色,紫色,粉红色,白色。)这将得到在列表中详尽无遗,可以有数百个可能的值,并且您正在搜索它们的子集。例如国家列表和您匹配的特定大陆国家/地区。

另一个选项是“花瓣颜色”的反向选择,并放入“bool”“must_not”。这比排列列表的工作少,因为elasticsearch支持聚合。

{
    "query" : {
        "filtered" : {
            "query" : {
                "bool" : {
                     "must" : [
                        {
                            "multi_match":{
                                "query":"disease resistant",
                                "type":"cross_fields",
                                "fields":[
                                    "description",
                                    "planting",
                                    "maintenance",
                                    "name"
                                ],
                                "tie_breaker":0.3
                            }
                        }
                     ],
                     "must_not" : [
                        {
                            "multi_match":{
                                "query":"lavender",
                                "type":"cross_fields",
                                "fields":[
                                    "description",
                                    "planting",
                                    "maintenance",
                                    "name"
                                ],
                                "tie_breaker":0.3
                            }
                        }
                     ],
                     "should" : [
         
                     ]
                 }
            },
            "filter" : {
                "and" : [
                    {
                        "or" : [
                            {"match" : {"type" : "Perennial"}},
                            {"match" : {"type" : "Annual"}}
                        ]
                    },
                    {
                        "or" : [
                            {"match" : {"petals.color" : "purple"}},
                            {"match" : {"petals.color" : "pink"}},
                            {"match" : {"petals.color" : "white"}}
                        ]
                    }
                ]
            }
        }
    }
}

1 个答案:

答案 0 :(得分:1)

嵌套[bool] [必须] [bool] [应该]将“minimum_should_match”仅隔离到正在搜索的列表(对象数组)。请参阅以下示例。

{
    "query" : {
        "bool" : {
            "must" : [
                {
                    "multi_match":{
                        "query":"disease resistant",
                        "type":"cross_fields",
                        "fields":[
                            "description",
                            "planting",
                            "maintenance",
                            "name"
                        ],
                        "tie_breaker":0.3
                    }
                },
                "bool" : {
                    "should" : [
                        {"match" : {"type" : "Perennial"}},
                        {"match" : {"type" : "Annual"}}
                    ],
                    "minimum_should_match" : 1
                },
                "bool" : {
                    "should" : [
                        {"match" : {"petals.color" : "purple"}},
                        {"match" : {"petals.color" : "pink"}},
                        {"match" : {"petals.color" : "white"}}
                    ],
                    "minimum_should_match" : 1
                }
            ],
            "must_not" : [
                {
                    "multi_match":{
                        "query":"lavender",
                        "type":"cross_fields",
                        "fields":[
                            "description",
                            "planting",
                            "maintenance",
                            "name"
                        ],
                        "tie_breaker":0.3
                    }
                }
            ]
        }
    }
}