我在MySQL数据库中有数据,如下所示:
name |score
----------
alice|60
mary |55
...
名称可以在列表中多次出现,但也可以只出现一次。我想要的是根据名称的95%置信区间的下限来排序列表。我尝试了以下方法:
SELECT name, count(*) as count_n, stddev_samp(score) as stdv, avg(score) as mean
FROM `my.table`
GROUP BY name
ORDER BY avg(score)-1.96*std(score)/sqrt(count(*)) desc
这会产生一个正常的输出。理想情况下,我想改变值1.96,因为这应该取决于该名称的count_n的值。实际上,它应该是基于count_n-1自由度的t分布的值。是否有可以为我做这个的MySQL功能?
我看到以下answer看起来不错,但并没有像我想的那样改变价值。
答案 0 :(得分:0)
我通过使用以下结构创建一个单独的表'tdistribution'来解决我的问题:
dof | tvalue
------------
1 | -12.706
2 | -4.3026
它包含自由度和相关的t值。然后,此表可以与原始样式查询连接。
SELECT table2.name,
round(table2.mean-abs(tdistribution.tvalue*table2.stdv/sqrt(table2.nn)),2) AS LCB,
round(table2.mean+abs(tdistribution.tvalue*table2.stdv/sqrt(table2.nn)),2) AS UCB
FROM
(SELECT table1.name, count(table1.name) AS nn, avg(table1.score) AS mean, stddev_samp(table1.score) AS stdv
FROM
(SELECT name, score FROM my.table) AS table1
GROUP BY name
) AS table2
LEFT JOIN tdistribution
ON table2.nn-1=tdistribution.dof
WHERE nn>1
ORDER BY LCB DESC
似乎有效!