多个指标上的相同聚合Elasticsearch

时间:2015-10-15 09:40:26

标签: php elasticsearch analytics snowplow

我已使用Elasticsearch设置snowplow

当我想要获取数据时,我只是进行正常查询并使用聚合来获取它们,白天,国家等。

所以我想弄清楚这些聚合的点击率,我有两种事件:网页浏览量和点击次数。

目前我做了2次查询:

网页浏览量:

{
    "size": 0,
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "event": "page_view"
                            }
                        }
                    ],
                    "must_not": {
                        "term": {
                            "br_family": "Robot"
                        }
                    }
                }
            }
        }
    },
    "aggs": {
        "dates": {
            "date_histogram": {
                "field": "collector_tstamp",
                "interval": "day"
            }
        }
    }
}

次数:

{
    "size": 0,
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "event": "struct"
                            }
                        },
                        {
                            "term": {
                                "se_action": "click"
                            }
                        }
                    ],
                    "must_not": {
                        "term": {
                            "br_family": "Robot"
                        }
                    }
                }
            }
        }
    },
    "aggs": {
        "dates": {
            "date_histogram": {
                "field": "collector_tstamp",
                "interval": "day"
            }
        }
    }
}

我将响应格式化为更容易使用的东西,然后使用类似的东西在PHP中合并它们。

function merge_metrics($pv,$c){
    $r = array();

    if(count($pv) > 0){
        foreach ($pv as $key => $value) {
            $r[$value['name']]['page_views'] += $value['count']; 
        }
    }
    if(count($c) > 0){
        foreach ($c as $key => $value) {
            $r[$value['name']]['clicks'] += $value['count']; 
        }
    }

    $rf = array();

    foreach ($r as $key => $value) {
        $tmp_clicks = isset($value['clicks']) ? $value['clicks'] : 0;
        $tmp_page_views = isset($value['page_views']) ? isset($value['page_views']) : 0;
        $rf[] = array(
                'name' => $key,
                'page_views' => $tmp_page_views,
                'clicks' => $tmp_clicks,
                'ctr' => ctr($tmp_clicks,$tmp_page_views)
            ); 
    }

    return $rf;
}

$ pv和$ c都是包含查询Elasticsearch产生的聚合的数组,我做了一些格式化以便于使用。

我的问题是:

是否可以获得多个指标(在我的情况下,页面查看和点击,这些是特定的过滤器)并在两者上执行相同的聚合?然后返回聚合类似于:

{
    "data": [
        {
            "day": "2015-10-13",
            "page_views": 61,
            "clicks": 0,
        },
        {
            "day": "2015-10-14",
            "page_views": 135,
            "clicks": 1,
        },
        {
            "day": "2015-10-15",
            "page_views": 39,
            "clicks": 0,
        }
    ]
}

但是我不必手动合并它们?

1 个答案:

答案 0 :(得分:2)

是的,如果您将聚合合并到一个查询中,这肯定是可能的。例如,我想你有一个像这样的查询用于页面浏览:

{
    "query": {...}
    "aggregations": {
        "by_day": {
            "date_histogram": {
                "field": "day",
                "interval": "day"
            },
            "aggs": {
                "page_views_per_day": {
                    "sum": {
                        "field": "page_views"
                    }
                }
            }
        }
    }
}

另外一个像点击这样的查询:

{
    "query": {...}
    "aggregations": {
        "by_day": {
            "date_histogram": {
                "field": "day",
                "interval": "day"
            },
            "aggs": {
                "clicks_per_day": {
                    "sum": {
                        "field": "clicks"
                    }
                }
            }
        }
    }
}

如果您在query中遇到相同的限制,那么您肯定可以在date_histogram级别将它们合并在一起,如下所示:

{
    "query": {...}
    "aggregations": {
        "by_day": {
            "date_histogram": {
                "field": "day",
                "interval": "day"
            },
            "aggs": {
                "page_views_per_day": {
                    "sum": {
                        "field": "page_views"
                    }
                },
                "clicks_per_day": {
                    "sum": {
                        "field": "clicks"
                    }
                }
            }
        }
    }
}

更新

由于您的每个聚合的查询都不同,我们需要略有不同,即使用额外的filters聚合,如下所示:

{
  "size": 0,
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "terms": {
                "event": [
                  "page_view",
                  "struct"
                ]
              }
            }
          ],
          "should": {
            "term": {
              "se_action": "click"
            }
          },
          "must_not": {
            "term": {
              "br_family": "Robot"
            }
          }
        }
      }
    }
  },
  "aggs": {
    "dates": {
      "date_histogram": {
        "field": "collector_tstamp",
        "interval": "day"
      },
      "aggs": {
        "my_filters": {
          "filters": {
            "filters": {
              "page_views_filter": {
                "bool": {
                  "must": [
                    {
                      "term": {
                        "event": "page_view"
                      }
                    }
                  ],
                  "must_not": {
                    "term": {
                      "br_family": "Robot"
                    }
                  }
                }
              },
              "clicks_filter": {
                "bool": {
                  "must": [
                    {
                      "term": {
                        "event": "struct"
                      }
                    },
                    {
                      "term": {
                        "se_action": "click"
                      }
                    }
                  ],
                  "must_not": {
                    "term": {
                      "br_family": "Robot"
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

现在,对于每个每日存储桶,您最终会得到两个子存储桶,一个用于计算页面查看次数,另一个用于点击次数。