嵌套存储区具有空格的聚合拆分项

时间:2017-08-25 13:55:13

标签: elasticsearch elasticsearch-5

我有以下弹性搜索索引的文件

{
                "_index": "ecommerce",
                "_type": "products",
                "_id": "12895",
                "_score": 1,
                "_source": {
                    "title": "Blue Armani Jeans",
                    "slug": "blue-armani-jeans",
                    "price": 200,
                    "sale_price": 0,
                    "vendor_id": 62,
                    "featured": 0,
                    "viewed": 0,
                    "stock": 1,
                    "sku": "arm-jeans",
                    "brand": "",
                    "rating": 0,
                    "active": 0,
                    "vendor_name": "Armani",
                    "category": [
                        "Men Fashion",
                        "Casual Wear"
                    ],
                    "image": "armani-jeans.jpg",
                    "variations": [
                        {
                            "variation_id": "32",
                            "stock": 10,
                            "price": 199,
                            "variation_image": "",
                            "sku": "arm-jeans-11",
                            "Size": "38",
                            "Color": "Blue"
                        },
{
                            "variation_id": "33",
                            "stock": 10,
                            "price": 199,
                            "variation_image": "",
                            "sku": "arm-jeans-12",
                            "Size": "40",
                            "Color": "Blue"
                        }
                    ]
                }
            },

我正在使用一个查询,它可以通过聚合显示所有过滤器变体。

查询:

{
    "size": 0,
    "aggs": {
        "variations": {
            "nested": {
                "path": "variations"
            },
            "aggs": {
                "size": {
                    "terms": {
                        "field": "variations.Size"
                    }
                },
                "color": {
                    "terms": {
                        "field": "variations.Color"
                    }
                },
                "brand": {
                    "reverse_nested": {},
                    "aggs": {
                        "brand": {
                            "value_count": {
                                "field": "brand"
                            }
                        }
                    }
                }
            }
        }
    }
}

输出

"color": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 543,
                "buckets": [
                    {
                        "key": "black",
                        "doc_count": 298
                    },
                    {
                        "key": "blue",
                        "doc_count": 227
                    },
                    {
                        "key": "brown",
                        "doc_count": 170
                    },
                    {
                        "key": "white",
                        "doc_count": 153
                    },
                    {
                        "key": "pink",
                        "doc_count": 127
                    },
                    {
                        "key": "grey",
                        "doc_count": 120
                    },
                    {
                        "key": "multi",
                        "doc_count": 99
                    },
                    {
                        "key": "red",
                        "doc_count": 89
                    },
                    {
                        "key": "color",
                        "doc_count": 81
                    },
                    {
                        "key": "green",
                        "doc_count": 76
                    }
                ]
            },
            "brand": {
                "doc_count": 621,
                "brand": {
                    "value": 6
                }
            },
            "size": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 517,
                "buckets": [
                    {
                        "key": "size",
                        "doc_count": 195
                    },
                    {
                        "key": "s",
                        "doc_count": 158
                    },
                    {
                        "key": "free",
                        "doc_count": 156
                    },
                    {
                        "key": "m",
                        "doc_count": 140
                    },
                    {
                        "key": "l",
                        "doc_count": 134
                    },
                    {
                        "key": "xl",
                        "doc_count": 102
                    },
                    {
                        "key": "9",
                        "doc_count": 69
                    },
                    {
                        "key": "8",
                        "doc_count": 68
                    },
                    {
                        "key": "10",
                        "doc_count": 67
                    },
                    {
                        "key": "11",
                        "doc_count": 61
                    }
                ]
            }

如果他们没有任何空格,那么这些记录很好,但是对于像“免费尺寸”这样的变化,它会将它们分成“自由”和“大小”。

如何将它们视为单一变异参数?或者是否有针对这种情况的专门查询?

1 个答案:

答案 0 :(得分:0)

问题是您的映射很可能是这样的:

...
"variations": {
  "properties": {
    "Size": {
      "type": "text",
      "analyzer": "standard"
    ...

这有点过于简化了,但是当Elasticsearch对文档进行索引时,它首先对它们进行分析,然后将它们拆分为标记,并修改标记以使它们最适合搜索,然后将索引存储在索引中每个令牌中的许多都出现在每个文档中。例如,如果你有一个文字说“狗很棒”,而有人搜索“狗”,你想要匹配那个文字,因为它是关于狗的。 Elasticsearch具有超强的功能,可用于各种用途,但其首要目的是自然语言文本搜索。所以默认情况下,这就是它所准备的。如果您想要其他行为(使用mapping),您需要明确告诉它。

当你进行术语聚合时,遍历每个文档的原始文本是非常低效的,而不是仅仅使用已经创建的索引,这些索引方便地包含每个文档的术语计数。如果您有标准的分析文本,则“条款”在这种情况下表示“免费”和“大小”,“免费大小”。如果要将整个字段编入索引作为术语,可以使用“关键字”类型而不是“文本”类型:

...
"variations": {
  "properties": {
    "Size": {
      "type": "keyword"
    ...

如果您没有明确设置此字段的任何映射,则ES 5中的默认值实际上已经有keyword - 映射字段:

...
"variations": {
  "properties": {
    "Size": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
    ...

这意味着如果您没有超过256个字符的任何大小值,您只需将聚合更新为如下所示:

        ...
        "aggs": {
            "size": {
                "terms": {
                    "field": "variations.Size.keyword"
                }
            },
        ...

但是,除非您实际使用的是分析字段,否则我建议您使用映射Size的{​​{1}}字段重新编制索引。