Question

示例表：

age | fruit | number_bought |
20  | apple | 3000000       |
20  | apple | 20            |
20  | apple | 60            |
20  | apple | 30            |
20  | apple | 50            |
20  | apple | 4             |
20  | banana| 40            |
30  | grape | 400           |
30  | grape | 450           |
30  | grape | 500           |

仅列出特定年龄的人在该年龄时购买的特定水果的数量。

现在，我需要按照＃34;最流行的水果＆＃34;对这张桌子进行排序，按年龄和水果分组。

这是一个棘手的部分，我想使用MEDIAN来计算人气，而不仅仅是平均数。因为有些人可能远远不正常（他可能是推销员），如上例中的3000000，而平均20岁＆＃34;从示例中可以看出，购买的情况很少。

上表按人口中位数排序，应如下所示：

age | fruit | median |
30  | grape | 450    |
20  | apple | 40     |
20  | banana| 40     |

现在，如果我只是使用＆＃34;平均＆＃34;计算，20，苹果本来会赢得人气，仅仅是因为一个推销员。这就是为什么我要使用中位数。

Answer 1

当存在偶数项目时（例如，测试数据中包含apple），常见的中位数查询似乎很难。

简单的方法是： -

SELECT y.age, x.fruit, AVG(x.number_bought) AS number_bought
from data x
INNER JOIN data y
ON x.age = y.age
AND x.fruit = y.fruit
GROUP BY y.age, x.fruit, x.number_bought
HAVING SUM(SIGN(1-SIGN(y.number_bought-x.number_bought))) = FLOOR((COUNT(*)+1)/2)
ORDER BY number_bought DESC;

这不是严格准确的，因为它只是取中间的那个（即，6个记录的中位数将是位置3.5中的一个 - 这只是使用FLOOR并获得记录3）。

可能会稍微准确一点，当有偶数时，它会得到2条记录的平均值

SELECT age, fruit, AVG(number_bought) AS number_bought
FROM 
(
    SELECT y.age, x.fruit, AVG(x.number_bought) AS number_bought
    from data x
    INNER JOIN data y
    ON x.age = y.age
    AND x.fruit = y.fruit
    GROUP BY y.age, x.fruit, x.number_bought
    HAVING SUM(SIGN(1-SIGN(y.number_bought-x.number_bought))) = FLOOR((COUNT(*)+1)/2)
    OR SUM(SIGN(1-SIGN(y.number_bought-x.number_bought))) = CEIL((COUNT(*)+1)/2)
) Sub1
GROUP BY age, fruit
ORDER BY number_bought DESC;

SQL在这里摆弄： -

http://www.sqlfiddle.com/#!2/f1b49/13

MySQL中的中位数复杂排序

1 个答案: