elasticsearch查询还匹配其中包含破折号的术语

时间:2017-09-07 13:55:19

标签: elasticsearch elasticsearch-plugin

我的查询类似于下面的查询

{
    "size": 15,
    "from": 0,
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "match_phrase": {
                                "category": "men_fashion"
                            }
                        },
                        {
                            "match_phrase": {
                                "category": "western_clothing"
                            }
                        },
                        {
                            "match_phrase": {
                                "category": "shirts"
                            }
                        }
                    ]
                }
            }
        }
    }

这里的问题是它还会提取该类别中的产品 " T恤&#34 ;.如何限制它只能找到完全匹配?

更新:这是我用于映射的代码

{
    "mappings": {
        "products": {
            "properties": {
                "variations": {
                    "type": "nested"
                }
            }
        }
    }
}

以下是实际的样本产品

{
    "title": "100% Cotton Unstitched Suit For Men",
    "slug": "100-cotton-unstitched-suit-for-men",
    "price": 200,
    "sale_price": 0,
    "vendor_id": 32,
    "featured": 0,
    "viewed": 20,
    "stock": 4,
    "sku": "XXX-B",
    "rating": 0,
    "active": 1,
    "vendor_name": "vendor_name",
    "category": [
        "men_fashion",
        "traditional_clothing",
        "unstitched_fabric"
    ],
    "image": "imagename.jpg",
    "variations": [
        {
            "variation_id": "34",
            "stock": 5,
            "price": 200,
            "variation_image": "",
            "sku": "XXX-C",
            "size": "m",
            "color": "red"
        },
        {
            "variation_id": "35",
            "stock": 5,
            "price": 200,
            "variation_image": "",
            "sku": "XXX-D",
            "size": "l",
            "color": "red"
        }
    ]
}

1 个答案:

答案 0 :(得分:0)

您没有提供有关映射的任何信息,因此我假设您已将标准分析器应用于category字段。查看您的查询(过滤器语法)我还假设您使用的ES版本低于5.0。

使用标准分析器,在索引category文档时会创建t-shirt字段的以下术语:

http://127.0.0.1:9200/_analyze?analyzer=standard&text=t-shirt
{
    "tokens": [
        {
            "token": "t",
            "start_offset": 0,
            "end_offset": 1,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "shirt",
            "start_offset": 2,
            "end_offset": 7,
            "type": "<ALPHANUM>",
            "position": 1
        }
    ]
}

现在,当您搜索shirts时,您还会获得t-shirts个文档。

如果无法分析用例中的category字段(您不需要全文搜索),则只需将category字段标记为not_analyzed

{
    "mappings": {
        "data": {
            "properties": {
                "category": {
                    "type":     "string",
                    "index":    "not_analyzed"
                }
            }
        }
    }
}

如果您需要保留分析category内容的功能,那么您可以使用Whitespace analyzer(短划线不会被视为单词分隔符):

{
    "mappings": {
        "data": {
            "properties": {
                "category": {
                    "type": "string",
                    "analyzer": "whitespace"
                }
            }
        }
    }
}

另一种解决方案是使用Keyword analyzer,但它与not_analyzed选项类似。

这完全取决于您的需求,但所有解决方案都需要更改索引的映射。您可以使用以下方法检查分析仪的行为:

http://127.0.0.1:9200/_analyze?analyzer=whitespace&text=t-shirt
http://127.0.0.1:9200/_analyze?analyzer=keyword&text=t-shirt
http://127.0.0.1:9200/_analyze?analyzer=standard&text=t-shirt

其他信息

基本上您在category字段上搜索,因此variations嵌套的事实不是重要的。类型为category的{​​{1}}字段可以包含值数组,这也不是问题。

使用此映射(注释string):

"analyzer": "whitespace"

我索引了两个文件

文件1

PUT http://localhost:9200/test
{
    "mappings": {
        "products": {
            "properties": {
                "variations": {
                    "type": "nested",
                    "properties": {
                        "size":    { "type": "string" },
                        "color":   { "type": "string" },
                        ... // other nested fields
                    }
                },
                "category":    { 
                    "type": "string",
                    "analyzer": "whitespace"
                },
                ... // other fields
            }
        }
    }
}

文件2

{
    "category": [
        "men_fashion",
        "traditional_clothing",
        "unstitched_fabric",
        "shirts"
    ],
    "image": "imagename.jpg",
    "variations": [
        {
            "variation_id": "34",
            "stock": 5,
            "price": 200,
            "variation_image": "",
            "sku": "XXX-C",
            "size": "m",
            "color": "red"
        }
    ]
}

现在我搜索:

{
    "category": [
        "men_fashion",
        "traditional_clothing",
        "unstitched_fabric",
        "t-shirts"
    ],
    "image": "imagename.jpg",
    "variations": [
        {
            "variation_id": "34",
            "stock": 5,
            "price": 200,
            "variation_image": "",
            "sku": "XXX-C",
            "size": "m",
            "color": "red"
        },
        {
            "variation_id": "35",
            "stock": 5,
            "price": 200,
            "variation_image": "",
            "sku": "XXX-D",
            "size": "l",
            "color": "red"
        }
    ]
}

我只获得了第1号文件。

如果需要,您也可以以类似的方式将{ "size": 15, "from": 0, "query": { "filtered": { "filter": { "bool": { "must": [ { "match_phrase": { "category": "men_fashion" } }, { "match_phrase": { "category": "shirts" } } ] } } } } } 添加到嵌套的"analyzer": "whitespace"等字段中(但也必须更改搜索查询以搜索嵌套文档)。