AWS Athena - 应用过滤器,然后计算百分位数

时间:2018-06-13 13:12:58

标签: amazon-web-services amazon-athena presto

我使用AWS Athena来计算一些指标。我有一个像这样的数据集:

sessionumber 0 10 -1 10 2 -10 10

我试图计算该值的百分位数,但仅针对有效值的子集。有效值为sessionnumber > 1,因此我尝试了:

with testfun AS 
    (SELECT filter(array_agg(sessionnumber), x -> x >= 1) as validvalues 
     FROM "mydate")

SELECT (percentiles(validvalues, 0.25) FROM testfun

但是我收到了以下错误:

SYNTAX_ERROR: line 17:10: Unexpected parameters (array(integer), double) for function approx_percentile. Expected: approx_percentile(bigint, double) , approx_percentile(bigint, bigint, double) , approx_percentile(bigint, bigint, double, double) , approx_percentile(bigint, array(double)) , approx_percentile(bigint, bigint, array(double)) , approx_percentile(double, double) , approx_percentile(double, bigint, double, double) , approx_percentile(double, bigint, double) , approx_percentile(double, array(double)) , approx_percentile(double, bigint, array(double)) , approx_percentile(real, double) , approx_percentile(real, bigint, double, double) , approx_percentile(real, bigint, double) , approx_percentile(real, array(double)) , approx_percentile(real, bigint, array(double))

我理解我的错误,但我无法找到解决AWS Athena / PrestoDB的方法。甚至可以做这样的事情吗?

1 个答案:

答案 0 :(得分:3)

我找到了解决方法,并在此分享:

WITH validValues AS 
(SELECT approx_percentile(sessionnumber, ARRAY[0.25,0.50,0.75,0.95, 0.99]) as percentiles from (SELECT sessionnumber from "20180407" where sessionnumber >= 1))

SELECT percentiles FROM testfun, validValues