按字段中的字符串的一部分进行分组,而不是Elasticsearch

时间:2016-08-25 08:46:46

标签: elasticsearch elasticsearch-query

我索引的结构:

[
    {
        "Id":"1",
        "Path":"/Series/Current/SerieA/foo/foo",
        "PlayCount":100
    },
    {
        "Id":"2",
        "Path":"/Series/Current/SerieA/bar/foo",
        "PlayCount":1000
    },
    {
        "Id":"3",
        "Path":"/Series/Current/SerieA/bar/bar",
        "PlayCount":50
    },
    {
        "Id":"4",
        "Path":"/Series/Current/SerieB/bla/bla",
        "PlayCount":300
    },
    {
        "Id":"5",
        "Path":"/Series/Current/SerieB/goo/boo",
        "PlayCount":200
    },
    {
        "Id":"6",
        "Path":"/Series/Current/SerieC/foo/zoo",
        "PlayCount":100
    }
]

我想执行一个聚合,为每个系列带来“PlayCount”的总和,如:

[
    {
        "key":"serieA",
        "TotalPlayCount":1150
    },
    {
        "key":"serieB",
        "TotalPlayCount":500
    },
    {
        "key":"serieC",
        "TotalPlayCount":100
    }
]

这是我尝试这样做但显然查询失败,因为这不是正确的方法:

{
    "size": 0,
    "query":{
        "filtered":{
            "query":{
                "regexp":{
                    "Path":"/Series/Current/.*"
                }
            }
        }
    },
    "aggs":{
        "play_count_for_current_series":{
            "terms": {
                "field": "Path", 
                "regexp": "/Series/Current/([^/]+)"
            },
            "aggs":{
                "Total_play": { "sum": { "field": "PlayCount" } }
            }
        }
    }
}

有办法吗?

1 个答案:

答案 0 :(得分:0)

我的建议如下:

DELETE test
PUT /test
{
  "settings": {
    "analysis": {
      "filter": {
        "my_special_filter": {
          "type": "pattern_capture",
          "preserve_original": 0,
          "patterns": [
            "/Series/Current/([^/]+)"
          ]
        }
      },
      "analyzer": {
        "my_special_analyzer": {
          "tokenizer": "whitespace",
          "filter": [
            "my_special_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "Path": {
          "type": "string",
          "fields": {
            "for_aggregations": {
              "type": "string",
              "analyzer": "my_special_analyzer"
            },
            "raw": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    }
  }
}

创建一个使用pattern_capture过滤器的特殊分析器,仅捕获您感兴趣的那些术语。因为我不想更改该字段的当前映射,所以我添加了fields部分,其中包含将使用此特殊分析器的子字段。我还添加了raw字段是not_analyzed,它将有助于查询本身。

POST test/test/_bulk
{"index":{}}
{"Id":"1","Path":"/Series/Current/SerieA/foo/foo","PlayCount":100}
{"index":{}}
{"Id":"2","Path":"/Series/Current/SerieA/bar/foo","PlayCount":1000}
{"index":{}}
{"Id":"3","Path":"/Series/Current/SerieA/bar/bar","PlayCount":50}
{"index":{}}
{"Id":"4","Path":"/Series/Current/SerieB/bla/bla","PlayCount":300}
{"index":{}}
{"Id":"5","Path":"/Series/Current/SerieB/goo/boo","PlayCount":200}
{"index":{}}
{"Id":"6","Path":"/Series/Current/SerieC/foo/zoo","PlayCount":100}
{"index":{}}
{"Id":"7","Path":"/Sersdasdies/Curradent/SerieC/foo/zoo","PlayCount":100}

对于查询,您不需要查询中的正则表达式,因为您的聚合将使用仅包含所需SerieX项的子字段。

GET /test/test/_search
{
  "size": 0,
  "query": {
    "filtered": {
      "query": {
        "regexp": {
          "Path.raw": "/Series/Current/.*"
        }
      }
    }
  },
  "aggs": {
    "play_count_for_current_series": {
      "terms": {
        "field": "Path.for_aggregations"
      },
      "aggs": {
        "Total_play": {
          "sum": {
            "field": "PlayCount"
          }
        }
      }
    }
  }
}

结果是

  "play_count_for_current_series": {
     "doc_count_error_upper_bound": 0,
     "sum_other_doc_count": 0,
     "buckets": [
        {
           "key": "SerieA",
           "doc_count": 3,
           "Total_play": {
              "value": 1150
           }
        },
        {
           "key": "SerieB",
           "doc_count": 2,
           "Total_play": {
              "value": 500
           }
        },
        {
           "key": "SerieC",
           "doc_count": 1,
           "Total_play": {
              "value": 100
           }
        }
     ]
  }