计算几列的相应z得分

时间:2013-10-09 18:00:19

标签: sql postgresql

我正在使用SQL查询来确定几列的z得分(x - μ/σ)。

特别是,我有一个如下表:

my_table
id    col_a  col_b  col_c
1     3      6      5
2     5      3      3
3     2      2      9
4     9      8      2

...我想根据其列的平均值和标准偏差选择每行中每个数字的z得分。

所以结果看起来像这样:

id    col_d     col_e     col_f
1    -0.4343    1.0203    ...
2     0.1434   -0.8729
3    -0.8234   -1.2323
4     1.889     1.5343

目前我的代码计算两列的分数,如下所示:

select id,
   (my_table.col_a - avg(mya.col_a)) / stddev(mya.col_a) as col_d,
   (my_table.col_b - avg(myb.col_b)) / stddev(myb.col_b) as col_e, 
from my_table,
select col_a from my_table)mya,
select col_b from my_table)myb
group by id;

但是,这非常慢。我一直在等待三分钟查询。

有没有更好的方法来实现这一目标?我正在使用postgres,但任何一般语言都会对我有帮助。谢谢!

3 个答案:

答案 0 :(得分:15)

你可以使用这样的窗口函数:

select
    t.id,
    (t.col_a - avg(t.col_a) over()) / stdev(t.col_a) over() as col_d,
    (t.col_b - avg(t.col_b) over()) / stdev(t.col_b) over() as col_e
from my_table as t

或与预先计算的avgstdev交叉加入:

select
    t.id,
    (t.col_a - tt.col_a_avg) / tt.col_a_stdev as col_d,
    (t.col_b - tt.col_b_avg) / tt.col_b_stdev as col_e
from my_table as t
    cross join (
        select 
            avg(tt.col_a) as col_a_avg,
            avg(tt.col_b) as col_b_avg,
            stdev(tt.col_a) as col_a_stdev,
            stdev(tt.col_b) as col_b_stdev
        from my_table as tt
   ) as tt

答案 1 :(得分:0)

使用WITH子句:

WITH stats AS ( SELECT avg ( col_a ) a_avg, stddev ( col_a ) a_stddev,
                       avg ( col_b ) b_avg, stddev ( col_b ) b_stddev
                    FROM my_table 
              )
SELECT id, ( col_a - a_avg) / a_stddev col_d, 
           ( col_b - b_avg) / b_stddev col_e
    FROM my_table, stats

但我更喜欢罗曼的窗户解决方案。

对于Oğuz:处理my_table中的NULL值:

WITH stats AS ( 
              SELECT avg ( col_a ) a_avg, stddev ( col_a ) as a_stddev,
                     avg ( col_b ) b_avg, stddev ( col_b ) as b_stddev
                  FROM my_table 
              )
SELECT id, 
       COALESCE ( ( col_a - a_avg) / a_stddev, NULL ) col_d, 
       COALESCE ( ( col_b - b_avg) / b_stddev, NULL ) col_e
FROM my_table, stats

答案 2 :(得分:-2)

我首先将avg()和stddev()属性选择到表变量中,然后使用该表进行计算

所以你会得到一个包含以下列的表变量 AVG_col_a,stddev_col_a,AVG_col b,stddev_col_b ......

类似这样的事情

DECLARE @Table as table (AVG_col_a, stddev_col_a, AVG_col b, stddev_col_b ......)
INSERT into @Table
SELECT AVG(col_A), stddev(col_a), .......
FROM myTable

SELECT (m.col_a-AVG_col_a)/stddev_col_a as col_d,
       (m.col_b-AVG_col_b)/stddev_col_b as col_e
 FROM myTable m, @Table