将MEDIAN与GROUP BY

时间:2018-10-31 21:49:57

标签: mariadb median

从MariaDB 10.3.3开始,存在MEDIAN函数。 不幸的是,当我尝试将其与GROUP BY语句(当前使用v10.3.9)一起使用时,存在一个小问题。

给出下表:

CREATE TABLE testmed
  (
     id       INT NOT NULL auto_increment,
          PRIMARY KEY(id),
     group_id INT NOT NULL DEFAULT 0,
     score    INT NOT NULL DEFAULT 0
  ); 

用一些数据填充它:

INSERT INTO testmed (group_id, score) 
VALUES (1,1), (1,2), (1,2), (1,2), (1,3), (2,5), (2,7), (2,9), (2,11), (2,11);

现在,无论查询中是否包含GROUP BY,我都会得到不同的结果:

MariaDB [test]> SELECT group_id, score, MEDIAN(score) OVER (PARTITION BY group_id) FROM testmed;
+----------+-------+--------------------------------------------+
| group_id | score | MEDIAN(score) OVER (PARTITION BY group_id) |
+----------+-------+--------------------------------------------+
|        1 |     1 |                               2.0000000000 |
|        1 |     2 |                               2.0000000000 |
|        1 |     2 |                               2.0000000000 |
|        1 |     2 |                               2.0000000000 |
|        1 |     3 |                               2.0000000000 |
|        2 |     5 |                               9.0000000000 |
|        2 |     7 |                               9.0000000000 |
|        2 |     9 |                               9.0000000000 |
|        2 |    11 |                               9.0000000000 |
|        2 |    11 |                               9.0000000000 |
+----------+-------+--------------------------------------------+
10 rows in set (0.000 sec)
MariaDB [test]> SELECT group_id, score, MEDIAN(score) OVER (PARTITION BY group_id) FROM testmed GROUP BY group_id;
+----------+-------+--------------------------------------------+
| group_id | score | MEDIAN(score) OVER (PARTITION BY group_id) |
+----------+-------+--------------------------------------------+
|        1 |     1 |                               1.0000000000 |
|        2 |     5 |                               5.0000000000 |
+----------+-------+--------------------------------------------+

第一个是正确的,但是为什么它不能与GROUP BY一起正常工作。 目前,我正在使用这样的查询嵌套:

MariaDB [test]> SELECT * FROM (SELECT group_id, score, MEDIAN(score) OVER (PARTITION BY group_id) FROM testmed) t GROUP BY group_id;
+----------+-------+--------------------------------------------+
| group_id | score | MEDIAN(score) OVER (PARTITION BY group_id) |
+----------+-------+--------------------------------------------+
|        1 |     1 |                               2.0000000000 |
|        2 |     5 |                               9.0000000000 |
+----------+-------+--------------------------------------------+
2 rows in set (0.000 sec)

但是那样做感觉很错误。

正确的方法是什么?

1 个答案:

答案 0 :(得分:0)

您的第二个查询在技术上无效:

SELECT
    group_id,
    score,
    MEDIAN(score) OVER (PARTITION BY group_id)
FROM testmed
GROUP BY group_id;

它无效的原因是因为您选择了score子句中未出现的GROUP BY。这里的问题是您打算将数据库用于每个score哪个{em>值?似乎正在发生的事情是MariaDB在任意选择group_id的最小值。但是由于只有一个score值,所以中位数只会返回该单个值。

请记住,分析功能是在score聚合发生后 进行评估的。我认为这是您打算运行的查询:

GROUP BY

如果这不起作用,因为MariaDB不喜欢将SELECT DISTINCT group_id, MEDIAN(score) OVER (PARTITION BY group_id) score_median FROM testmed; DISTINCT一起使用,那么您可以尝试子查询:

MEDIAN