嵌套布尔查询与产品属性的子集不匹配

时间:2019-01-18 10:55:59

标签: elasticsearch

好几天以来,我一直在尝试在ES中对产品进行索引,除了名称,sku,id,价格,attribue_set_name外,还有一些属性子集,这些属性主要定义了可用的颜色选项或可用的产品尺寸。产品结构大致如下所示:

Array
(
    [id] => 52
    [sku] => CHAI_0018
    [name] => Chair
    [name_lc] => chair
    [attr_set_name] => Chairs
    [price] => 34.00
    [attributes] => Array
        (
            [0] => 470
            [1] => 815
            [2] => 560
        )
    [super_attr] => Array
        (
            [0] => Olive
            [1] => Black
            [2] => Blue
            [3] => Clear
            [4] => Dark Grey
            [5] => Green
            [6] => Grey
            [7] => Light Blue
            [8] => Orange
            [9] => Purple
            [10] => Red
            [11] => White
            [12] => Yellow
        )
)

考虑到我没有很多要编制索引的产品,而是可以配置产品,而我只是编制父产品的索引,因此我需要提供通过以下字段之一进行搜索的功能:

  • name_lc(小写名称)
  • sku
  • attr_set_name
  • 属性(我也曾尝试将其内化为字符串)
  • super_attr(也尝试内插到字符串中)

或“橄榄椅子”,“橄榄椅子”或“椅子470”之类的组合。

我使用以下映射创建索引:

        $params = [
            'index' => $index,
            'body' => [
                'settings' => [
                    'analysis' => [
                        'analyzer' => [
                            'autocomplete' => [
                                'tokenizer' => 'autocomplete',
                                'filter' => [
                                    'lowercase'
                                ]
                            ],
                            'autocomplete_search' => [
                                'tokenizer' => 'lowercase'
                            ]
                        ],
                        'tokenizer' => [
                            'autocomplete' => [
                                'type' => 'edge_ngram',
                                'min_gram' => 1,
                                'max_gram' => 50,
                                'token_chars' => [
                                    'letter',
                                ]
                            ]
                        ]
                    ]
                ],
                'mappings' => [
                    '_doc' => [
                        'properties' => [
                            'name_lc' => [
                                'type' => 'text',
                                'analyzer' => 'autocomplete',
                                'search_analyzer' => 'autocomplete_search'
                            ],
                            'sku' => [
                                'type' => 'text',
                                'analyzer' => 'autocomplete',
                                'search_analyzer' => 'autocomplete_search'
                            ],
                            'attr_set_name' => [
                                'type' => 'text',
                                'analyzer' => 'autocomplete',
                                'search_analyzer' => 'autocomplete_search'
                            ],
                            'attributes' => [
                                'type' => 'text',
                                'analyzer' => 'autocomplete',
                                'search_analyzer' => 'autocomplete_search'
                            ],
                            'super_attr' => [
                                'type' => 'text',
                                'analyzer' => 'autocomplete',
                                'search_analyzer' => 'autocomplete_search'
                            ],
                            'id' => [
                                'type' => 'integer'
                            ],
                            'price' => [
                                'type' => 'float'
                            ]
                        ]
                    ]
                ]
            ]
        ];

研究完ES文档并浏览了Google上的数十篇文章之后,我的最接近和最成功的结果是通过以下查询生成的:

        $params = [
            'index' => $index,
            'size' => $hits,
            'type' => '_doc',
            'body' => [
                "query" => [
                    "bool" => [
                        "must" => [
                            ["match" => [
                                "name_lc" => [
                                    "query" => $phrase,
                                    'operator' => 'or',
                                ]]],
                        ],
                        "should" => [
                            ["match" => [
                                "name_lc" => [
                                    "query" => $phrase,
                                    'fuzziness' => '1',
                                    'operator' => 'and',
//                                    "boost" => 15
                                ]]],
                            ["bool" => [
                                "should" => [
                                    ["match" => [
                                        "sku" => [
                                            "query" => $phrase,
                                            'operator' => 'or',
                                            "boost" => 5
                                        ]]
                                    ],
                                    ["match" => [
                                        "attr_set_name" => [
                                            "query" => $phrase,
                                            'fuzziness' => '1',
                                            'operator' => 'or',
                                            "boost" => 5
                                        ]]
                                    ],
                                    ["match" => [
                                        "super_attr" => [
                                            'query' => $phrase,
                                            'fuzziness' => '1',
                                            'operator' => 'or',
                                            "boost" => 5
                                        ]]
                                    ],
                                    ["match" => [
                                        "attributes" => [
                                            'query' => $phrase,
                                            'fuzziness' => '1',
                                            'operator' => 'or',
                                            "boost" => 5
                                        ]]
                                    ],
                                ]
                            ]]
                        ],
                        "filter" => [
                            "range" => [
                                "price" => [
                                    "gte" => 0,
                                    "lte" => 1000,
                                    "boost" => 2.0
                                ]
                            ]
                        ]
                    ]
                ]
            ]
        ];

但是,如果我搜索单词,此查询不会返回任何结果:

  • “橄榄”(super_attr)
  • “主席 s ”(attr_set_name)
  • “ CHAI”(SKU的前缀)

此行为使我认为嵌套的布尔查询无法正常工作。我尝试使用数字或映射进行组合,在可搜索字段以及输入文本中减小大小写,但是我什么也没找到。不知道问题出在错误的映射还是查询本身。我正在使用ES 6.5,但由于缺少结果,没有出现任何语法错误。我知道跨这么多个领域的搜索可能不是最快的,但是索引中包含不到500种产品,我认为这不会带来产品性能问题。

哦,关于价格范围过滤器,它正在正确过滤。

任何帮助将不胜感激。 谢谢。

1 个答案:

答案 0 :(得分:0)

好的,我离我想要达到的目标并不遥远。 对于将要来到这里的任何人,我都会发布我的解决方案。 简而言之,我的映射不正确,因为我将字段文本用于:

  • id
  • sku
  • attr_set_name

在更新我的映射并将字段类型设置为“关键字”之后,事情变得更加美好。我决定重做我的查询,并使用“ term”在这些字段中搜索完全匹配。然后事实证明,必须稍微提高精确的匹配度。进行了更多外观更改后,最终映射如下所示:

        $params = [
            'index' => $index,
            'body' => [
                'settings' => [
                    'analysis' => [
                        'analyzer' => [
                            'autocomplete' => [
                                'tokenizer' => 'autocomplete',
                                'filter' => [
                                    'lowercase'
                                ]
                            ],
                            'autocomplete_search' => [
                                'tokenizer' => 'lowercase'
                            ]
                        ],
                        'tokenizer' => [
                            'autocomplete' => [
                                'type' => 'edge_ngram',
                                'min_gram' => 1,
                                'max_gram' => 50,
                                'token_chars' => [
                                    'letter',
                                ]
                            ]
                        ]
                    ]
                ],
                'mappings' => [
                    '_doc' => [
                        'properties' => [
                            'id' => [
                                'type' => 'keyword'
                            ],
                            'sku' => [
                                'type' => 'keyword',
                            ],
                            'attr_set_name' => [
                                'type' => 'keyword',
                            ],
                            'name_search' => [
                                'type' => 'text',
                                'analyzer' => 'autocomplete',
                                'search_analyzer' => 'autocomplete_search'
                            ],
                            'attributes' => [
                                'type' => 'text',
                                'analyzer' => 'autocomplete',
                                'search_analyzer' => 'autocomplete_search'
                            ],
                            'super_attr' => [
                                'type' => 'text',
                                'analyzer' => 'autocomplete',
                                'search_analyzer' => 'autocomplete_search'
                            ],
                            'price' => [
                                'type' => 'float'
                            ]
                        ]
                    ]
                ]
            ]
        ];

并查询与之配合使用的

        $params = [
            'index' => $index,
            'size' => $hits,
            'type' => '_doc',
            'body' => [
                "query" => [
                    "bool" => [
                        'should' => [
                            ['match' => [
                                'name_search' => [
                                    'query' => $phrase,
                                    'operator' => 'and',
                                    'boost' => 4,
                                ]
                            ]],
                            ['match' => [
                                'name_search' => [
                                    'query' => $phrase,
                                    'operator' => 'or',
                                    'fuzziness' => '1',
                                    'boost' => 2,
                                ]
                            ]],
                            ['match' => ['attr_set_name' => $phrase]],
                            ['match' => [
                                'super_attr' => [
                                    'query' => $phrase,
                                    'boost' => 7,
                                ]
                            ]],
                            ['match' => ['attributes' => $phrase]],
                            ['term' => [
                                'sku' => [
                                'value' => $phrase,
                                "boost" => 5.0
                                ]
                            ]],
                            ['term' => [
                                'id' => [
                                    'value' => $phrase,
                                    "boost" => 5.0
                                ]
                            ]],
                        ],
                        "filter" => [
                            "range" => [
                                "price" => [
                                    "gte" => $min,
                                    "lte" => $max,
                                    "boost" => 2.0
                                ]
                            ]
                        ],
                        "minimum_should_match" => 2,
                    ],
                ]
            ]
        ];

我希望有一天能对某人有所帮助:D