我正在使用SQL查询来确定几列的z得分(x - μ/σ)。
特别是,我有一个如下表:
my_table
id col_a col_b col_c
1 3 6 5
2 5 3 3
3 2 2 9
4 9 8 2
...我想根据其列的平均值和标准偏差选择每行中每个数字的z得分。
所以结果看起来像这样:
id col_d col_e col_f
1 -0.4343 1.0203 ...
2 0.1434 -0.8729
3 -0.8234 -1.2323
4 1.889 1.5343
目前我的代码计算两列的分数,如下所示:
select id,
(my_table.col_a - avg(mya.col_a)) / stddev(mya.col_a) as col_d,
(my_table.col_b - avg(myb.col_b)) / stddev(myb.col_b) as col_e,
from my_table,
select col_a from my_table)mya,
select col_b from my_table)myb
group by id;
但是,这非常慢。我一直在等待三分钟查询。
有没有更好的方法来实现这一目标?我正在使用postgres,但任何一般语言都会对我有帮助。谢谢!
答案 0 :(得分:15)
你可以使用这样的窗口函数:
select
t.id,
(t.col_a - avg(t.col_a) over()) / stdev(t.col_a) over() as col_d,
(t.col_b - avg(t.col_b) over()) / stdev(t.col_b) over() as col_e
from my_table as t
或与预先计算的avg
和stdev
交叉加入:
select
t.id,
(t.col_a - tt.col_a_avg) / tt.col_a_stdev as col_d,
(t.col_b - tt.col_b_avg) / tt.col_b_stdev as col_e
from my_table as t
cross join (
select
avg(tt.col_a) as col_a_avg,
avg(tt.col_b) as col_b_avg,
stdev(tt.col_a) as col_a_stdev,
stdev(tt.col_b) as col_b_stdev
from my_table as tt
) as tt
答案 1 :(得分:0)
使用WITH子句:
WITH stats AS ( SELECT avg ( col_a ) a_avg, stddev ( col_a ) a_stddev,
avg ( col_b ) b_avg, stddev ( col_b ) b_stddev
FROM my_table
)
SELECT id, ( col_a - a_avg) / a_stddev col_d,
( col_b - b_avg) / b_stddev col_e
FROM my_table, stats
但我更喜欢罗曼的窗户解决方案。
对于Oğuz:处理my_table中的NULL值:
WITH stats AS (
SELECT avg ( col_a ) a_avg, stddev ( col_a ) as a_stddev,
avg ( col_b ) b_avg, stddev ( col_b ) as b_stddev
FROM my_table
)
SELECT id,
COALESCE ( ( col_a - a_avg) / a_stddev, NULL ) col_d,
COALESCE ( ( col_b - b_avg) / b_stddev, NULL ) col_e
FROM my_table, stats
答案 2 :(得分:-2)
我首先将avg()和stddev()属性选择到表变量中,然后使用该表进行计算
所以你会得到一个包含以下列的表变量 AVG_col_a,stddev_col_a,AVG_col b,stddev_col_b ......
类似这样的事情
DECLARE @Table as table (AVG_col_a, stddev_col_a, AVG_col b, stddev_col_b ......)
INSERT into @Table
SELECT AVG(col_A), stddev(col_a), .......
FROM myTable
SELECT (m.col_a-AVG_col_a)/stddev_col_a as col_d,
(m.col_b-AVG_col_b)/stddev_col_b as col_e
FROM myTable m, @Table