我在数据库中有一堆数据。
'Year', 'Disabled', 'non-disabled'
1990, 5, 3
1991, 2, 1
我希望能够找到每列的所有年份的平均值。所以我想要残疾数字的平均数,然后我想要非残疾人的平均年数。然后,我想使用t检验比较这两个值,看看两者之间是否存在显着差异。
我能这样做吗? postgresql是否有办法输入两个值并使用t检验获得p值?
答案 0 :(得分:0)
虽然是一个老问题,但我不得不做同样的事情并且找不到直接答案,所以我创建了自己的查询来做到这一点。如果有任何统计解释错误,请帮助我改进。
WITH table_column1_stats_CTE AS (
SELECT avg(column1) AS _mean,
stddev(column1) AS _stddev,
stddev(column1)/sqrt(count(*)) AS _se,
count(*) - 1 AS _df
FROM table
), table_column2_stats_CTE AS (
SELECT avg(column2) AS _mean
FROM table
), t_value_CTE AS (
SELECT _df,
abs(a._mean - b._mean) / (_stddev/sqrt(_df+1)) AS t_value
FROM table_column1_stats_CTE a, table_column2_stats_CTE b
), all_results_CTE AS (
SELECT *,
row_number() OVER (ORDER BY abs(a.df - _df) ASC) AS rank
FROM t_test_table a, t_value_CTE b
)
SELECT
CASE WHEN t_value <= p01 THEN 0.99
WHEN t_value <= p05 THEN 0.95
WHEN t_value <= p1 THEN 0.9
WHEN t_value <= p2 THEN 0.8 ELSE 0.0
END AS significance
FROM all_results_CTE
WHERE rank = 1
如果我可以附加 t_test_table 我会,但你可以通过复制来创建它: https://jimgrange.wordpress.com/2015/12/05/statistics-tables-where-do-the-numbers-come-from/