Elasticsearch聚合字符串的一部分,而不是完整的字符串

时间:2015-06-19 23:44:31

标签: node.js elasticsearch aggregation

基本上,我在这里要做的是从分层存储的字符串中获取二级向下的类别。问题在于层次结构的级别各不相同,一个产品类别可能有六个级别,另一个只有四个级别,否则我只会实现预定义级别。

我有一些类似的产品:

[
  {
    title: 'product one',
    categories: [
      'clothing/mens/shoes/boots/steel-toe'
    ]
  },
  {
    title: 'product two',
    categories: [
      'clothing/womens/tops/sweaters/open-neck'
    ]
  },
  {
    title: 'product three',
    categories: [
      'clothing/kids/shoes/sneakers/light-up'
    ]
  },
  {
    title: 'product etc.',
    categories: [
      'clothing/baby/bibs/super-hero'
    ]
  }, 
  ... more products
]

我正试图像这样得到聚合桶:

buckets: [
  {
    key: 'clothing/mens',
    ...
  },
  {
    key: 'clothing/womens',
    ...
  },
  {
    key: 'clothing/kids',
    ...
  },
  {
    key: 'clothing/baby',
    ...
  },
]

我已经尝试查看过滤器前缀,包含和排除条款,但我找不到任何有用的东西。请有人指出我正确的方向。

1 个答案:

答案 0 :(得分:2)

应使用自定义分析器分析您的category字段。也许您对category有其他一些计划,所以我只添加一个仅用于聚合的子字段:

{
  "settings": {
    "analysis": {
      "filter": {
        "category_trimming": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "(^\\w+\/\\w+)"
          ]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "keyword",
          "filter": [
            "category_trimming",
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "category": {
          "type": "string",
          "fields": {
            "just_for_aggregations": {
              "type": "string",
              "analyzer": "my_analyzer"
            }
          }
        }
      }
    }
  }
}

测试数据:

POST /index/test/_bulk
{"index":{}}
{"category": "clothing/womens/tops/sweaters/open-neck"}
{"index":{}}
{"category": "clothing/mens/shoes/boots/steel-toe"}
{"index":{}}
{"category": "clothing/kids/shoes/sneakers/light-up"}
{"index":{}}
{"category": "clothing/baby/bibs/super-hero"}

查询本身:

GET /index/test/_search?search_type=count
{
  "aggs": {
    "by_category": {
      "terms": {
        "field": "category.just_for_aggregations",
        "size": 10
      }
    }
  }
}

结果:

   "aggregations": {
      "by_category": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "clothing/baby",
               "doc_count": 1
            },
            {
               "key": "clothing/kids",
               "doc_count": 1
            },
            {
               "key": "clothing/mens",
               "doc_count": 1
            },
            {
               "key": "clothing/womens",
               "doc_count": 1
            }
         ]
      }
   }