无论输入参数如何,PERCENTILE_CONT()都返回相同的值

时间:2018-04-12 20:57:00

标签: sql memsql

我想获得一张桌子的第5,第50,第95百分位数

SELECT col1, col2, col3, AVG(col4), STD(col4), 
    PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY col4) 
        OVER (PARTITION BY col1, col2, col3) as 5th_percentile, 
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col4)  
        OVER (PARTITION BY col1, col2, col3) as 50th_percentile, 
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY col4)  
        OVER (PARTITION BY col1, col2, col3) as 95th_percentile
FROM table
GROUP BY col1, col2, col3
LIMIT 100

我最终得到的是5th_percentile == 50th_percentile == 95th_percentile

AVG(col4)   STD(col4)   5th_percentile   50th_percentile  95th_percentile
300.000000  0.000000    300.000000       300.000000       300.000000
67.076600   16.968851   82.031792        82.031792        82.031792
66.166136   11.452172   78.348846        78.348846        78.348846
544.262809  68.269014   605.797302       605.797302       605.797302
22.523138   1.820358    24.000000        24.000000        24.000000

怎么回事?

编辑:数据库是MemSQL

3 个答案:

答案 0 :(得分:2)

窗口函数在 GROUP BY子句之后运行。 GROUP BY每组产生一行,这就是为什么PERCENTILE_CONT窗口函数都返回相同的值。

您想先计算窗口函数,然后再计算GROUP BY。您可以通过将窗口函数放在内部子选择中,将GROUP BY放在外部选择中来完成此操作。

这是来自postgres的文档,它解释了窗口函数如何与group by相关(这是标准的ANSI SQL,而MemSQL也做同样的事情):

https://www.postgresql.org/docs/current/static/tutorial-window.html

  

窗口函数考虑的行是查询的FROM子句生成的“虚拟表”的行,如其中的WHERE,GROUP BY和HAVING子句所过滤的那样。例如,任何窗口函数都看不到因为它不符合WHERE条件而被删除的行。查询可以包含多个窗口函数,这些函数通过不同的OVER子句以不同的方式对数据进行切片,但它们都作用于此虚拟表定义的相同行集合。

请注意,在MemSQL中,如果使用未分组或聚合的列(例如查询中的col4),则会从组中的行中获取任意值,即它的行为类似于ANY_VALUE聚合。在MemSQL的未来版本中,此查询将返回错误,以帮助您避免编写具有此类意外行为的查询。

答案 1 :(得分:0)

PERCENTILE_CONT() - 至少在某些数据库中 - 可以是聚合函数或窗口函数。

我认为正在发生的是在聚合后计算的值 - 我不知道为什么。说实话,我希望代码得到语法错误,因为col4没有聚合。换句话说,(ORDER BY MAX(col4))应该有效,但不是(ORDER BY col4),因为百分位数在聚合之后计算

但请尝试不使用OVER子句:

SELECT col1, col2, col3, AVG(col4), STD(col4), 
       PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY col4)  as 5th_percentile, 
       PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col4) as 50th_percentile, 
       PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY col4) as 95th_percentile
FROM table
GROUP BY col1, col2, col3
LIMIT 100;

编辑:

您的数据库似乎不支持PERCENTILE_CONT()作为聚合函数。没有考虑到味道。大多数人都这样做。

解决方法是SELECT DISTINCT

SELECT DISTINCT col1, col2, col3,
       AVG(col4) OVER (PARTITION BY col1, col2, col3),
       STD(col4) OVER (PARTITION BY col1, col2, col3),
       PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY col4) OVER (PARTITION BY col1, col2, col3)  as 5th_percentile, 
       PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col4) OVER (PARTITION BY col1, col2, col3) as 50th_percentile, 
       PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY col4) OVER (PARTITION BY col1, col2, col3) as 95th_percentile
FROM table
LIMIT 100;

或使用子查询。

答案 2 :(得分:0)

WITH a AS (
SELECT col1, col2, col3, 
        PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY col4) 
            OVER (PARTITION BY col1, col2, col3) as 5th_percentile,
        PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col4) 
            OVER (PARTITION BY col1, col2, col3) as 50th_percentile,
        PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY col4) 
            OVER (PARTITION BY col1, col2, col3) as 95th_percentile
FROM table
)
SELECT DISTINCT col1, col2, col3, 5th_percentile, 50th_percentile, 95th_percentile
FROM a
LIMIT 100

这很有效,看起来你不能用percentile_cont

做一个groupby