Elasticsearch访问直方图

时间:2015-11-28 10:18:28

标签: elasticsearch aggregation

我对Elasticsearch很新,我无法根据访问范围构建直方图。我甚至不确定是否可以通过在Elasticsearch中使用单个查询来创建此类图表,但我感觉可能通过管道聚合或者可能是脚本聚合。

这是我正在使用的测试数据集:

PUT /test_histo
{ "settings": { "number_of_shards": 1 }}

PUT /test_histo/_mapping/visit
{
   "properties": {
      "user": {"type": "string" },
      "datevisit": {"type": "date"},
      "page": {"type": "string"}
   }
}

POST test_histo/visit/_bulk
{"index":{"_index":"test_histo","_type":"visit"}}
{"user":"John","page":"home.html","datevisit":"2015-11-25"}
{"index":{"_index":"test_histo","_type":"visit"}}
{"user":"Jean","page":"productXX.hmtl","datevisit":"2015-11-25"}
{"index":{"_index":"test_histo","_type":"visit"}}
{"user":"Robert","page":"home.html","datevisit":"2015-11-25"}
{"index":{"_index":"test_histo","_type":"visit"}}
{"user":"Mary","page":"home.html","datevisit":"2015-11-25"}
{"index":{"_index":"test_histo","_type":"visit"}}
{"user":"Mary","page":"media_center.html","datevisit":"2015-11-25"}
{"index":{"_index":"test_histo","_type":"visit"}}
{"user":"John","page":"home.html","datevisit":"2015-11-25"}
{"index":{"_index":"test_histo","_type":"visit"}}
{"user":"John","page":"media_center.html","datevisit":"2015-11-26"}

如果我们考虑范围[1,2 [,[2,3 [,[3,inf。[

预期结果应为:

  • [1,2 [= 2
  • [2,3 [= 1
  • [3,inf。[= 1

我所有努力找到显示客户访问频率的直方图至今仍未成功。我很乐意提供一些提示,技巧或想法来回应我的问题。

3 个答案:

答案 0 :(得分:3)

有两种方法可以做到。

首先在ElasticSearch中进行,需要Scripted Metric Aggregation。您可以阅读更多相关信息here

您的查询将如下所示

{
  "size": 0,
  "aggs": {
    "visitors_over_time": {
      "date_histogram": {
        "field": "datevisit",
        "interval": "week"
      },
      "aggs": {
        "no_of_visits": {
          "scripted_metric": {
            "init_script": "_agg['values'] = new java.util.HashMap();",
            "map_script": "if (_agg.values[doc['user'].value]==null) {_agg.values[doc['user'].value]=1} else {_agg.values[doc['user'].value]+=1;}",
            "combine_script": "someHashMap = new java.util.HashMap();for(x in _agg.values.keySet()) {value=_agg.values[x];if(value<3){key='[' + value +',' + (value + 1) + '[';}else{key='[' + value +',inf[';}; if(someHashMap[key]==null){someHashMap[key] = 1}else{someHashMap[key] += 1}}; return someHashMap;"
          }
        }
      }
    }
  }
}

您可以在字段date_histogram中的interval对象中更改日期,周,月等值。

您的回复看起来像这样

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 7,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "visitors_over_time": {
      "buckets": [
        {
          "key_as_string": "2015-11-23T00:00:00.000Z",
          "key": 1448236800000,
          "doc_count": 7,
          "no_of_visits": {
            "value": [
              {
                "[2,3[": 1,
                "[3,inf[": 1,
                "[1,2[": 2
              }
            ]
          }
        }
      ]
    }
  }
} 

第二种方法是客户端scripted_metric的工作。您可以使用Terms Aggregation的结果。您可以阅读更多相关信息here

您的查询将如下所示 获取test_histo / visit / _search

{
  "size": 0,
  "aggs": {
    "visitors_over_time": {
      "date_histogram": {
        "field": "datevisit",
        "interval": "week"
      },
      "aggs": {
        "no_of_visits": {
          "terms": {
            "field": "user",
            "size": 10
          }
        }
      }
    }
  }
}

,响应将是

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 7,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "visitors_over_time": {
      "buckets": [
        {
          "key_as_string": "2015-11-23T00:00:00.000Z",
          "key": 1448236800000,
          "doc_count": 7,
          "no_of_visits": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "john",
                "doc_count": 3
              },
              {
                "key": "mary",
                "doc_count": 2
              },
              {
                "key": "jean",
                "doc_count": 1
              },
              {
                "key": "robert",
                "doc_count": 1
              }
            ]
          }
        }
      ]
    }
  }
}

您可以为每个时段的每个doc_count计算响应的位置。

答案 1 :(得分:0)

看看:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html

如果你想在幻想已经固定的UI中使用Kibana。

像这样的查询:

GET _search
{
   "query": {
      "match_all": {}
   }, 
   {
    "aggs" : {
        "visits" : {
            "date_histogram" : {
                "field" : "datevisit",
                "interval" : "month"
            }
        }
    }
}
}

应该给你一个直方图,我现在不具备弹性,所以我可能会有一些肥胖的错字。

然后你可以广告查询术语只显示特定页面的直方图我们你可以有一个聚合/页面或用户的外部聚合桶。

这样的事情:

GET _search
{
   "query": {
      "match_all": {}
   }, 
   {
       {
    "aggs" : {
        "users" : {
            "terms" : {
                "field" : "user",
            },
        "aggs" : {
            "visits" : {
                "date_histogram" : {
                    "field" : "datevisit",
                    "interval" : "month"
                }
            }
        }
  }
}

答案 2 :(得分:0)

看看这个解决方案:

{
    "query": {
        "match_all": {}
    },
    "aggs": {
        "periods": {
            "filters": {
                "filters": {
                    "1-2": {
                        "range": {
                            "datevisit": {
                                "gte": "2015-11-25", 
                                "lt": "2015-11-26"
                            }
                        }
                    }, 
                    "2-3": {
                        "range": {
                            "datevisit": {
                                "gte": "2015-11-26", 
                                "lt": "2015-11-27"
                            }
                        }
                    }, 
                    "3-": {
                        "range": {
                            "datevisit": {
                                "gte": "2015-11-27", 
                            }
                        }
                    }
                }
            },
            "aggs": {
                "users": {
                    "terms": {"field": "user"}
                }
            }
        }
    }
}

一步一步:
过滤器聚合:您可以为下一个聚合定义范围值,在这种情况下,我们根据日期范围过滤器定义3个周期 嵌套用户聚合:此聚合返回的结果与您定义的过滤器一样多。因此,在这种情况下,您将使用范围日期过滤来获得3个值
你会得到这样的结果:

{   
    ...
    "aggregations" : {
        "periods" : {
            "buckets" : {
                "1-2" : {
                    "users" : {
                        "buckets" : [
                            {"key" : XXX,"doc_count" : NNN},
                            {"key" : YYY,"doc_count" : NNN},
                        ]
                    }
                },
                "2-3" : {
                    "users" : {
                        "buckets" : [
                            {"key" : XXX1,"doc_count" : NNN1},
                            {"key" : YYY1,"doc_count" : NNN1},
                        ]
                    }
                },
                "3-" : {
                    "users" : {
                        "buckets" : [
                            {"key" : XXX2,"doc_count" : NNN2},
                            {"key" : YYY2,"doc_count" : NNN2},
                        ]
                    }
                },
            }
        }
    }
}

尝试一下,告诉它是否有效