用于Chi-SQUARE测试的SQL查询

时间:2013-08-04 13:10:42

标签: sql sql-server-2008 chi-squared

我正在尝试在表格中的以下数据集中找到CHI-SQUARE TEST。我正在尝试使用此查询来查找CHI-SQUARE TEST:

 SELECT sessionnumber, sessioncount, timespent,
 (dim1.cnt * dim2.cnt * dim3.cnt)/(dimall.cnt*dimall.cnt) as expected
 FROM (SELECT sessionnumber, SUM(cast(cnt as bigint)) as cnt
 FROM d3
 GROUP BY sessionnumber) dim1 CROSS JOIN
 (SELECT sessioncount, SUM(cast(cnt as bigint)) as cnt
 FROM d3
 GROUP BY sessioncount) dim2 CROSS JOIN
 (SELECT timespent, SUM(cast(cnt as bigint)) as cnt
 FROM d3
 GROUP BY timespent) dim3 CROSS JOIN
 (SELECT SUM(cast(cnt as bigint)) as cnt FROM d3) dimall

示例数据是:

sessionnumber   sessioncount    timespent       cnt
1                  17               28          45
2                  22               8           30
3                  1                1           2
4                  1                1           2
5                  8               111          119
6                  8                65          73
7                  11               5           16
8                  1                1           2
9                  62               64          126
10                 6                42          48

但它给出了卡方检验值的错误输出,它给出的输出是:

sessionnumber   sessioncount    timespent   expected
1                  23               1          0
2                  23               1          0
3                  23               1          0
4                  23               1          0
5                  23               1          0
6                  23               1          0
7                  23               1          0
8                  23               1          0
9                  23               1          0
10                 23               1          0

我已经尽了最大努力并且搜索了很多这个问题。请帮个忙,好好解决问题!提前谢谢!

2 个答案:

答案 0 :(得分:2)

整数数学,将dimall.cnt转换为十进制或数字或执行以下操作

/(dimall.cnt* 1.00)* (dimall.cnt * 1.00)

另一个解释实际发生情况的例子

select 3/2  -- output = 1, integer math, result is an integer

select 3/2.00  -- output = 1.50

答案 1 :(得分:2)

因为您已经在计算中进行了强制转换,所以您也可以转为float而不是bigint

 SELECT sessionnumber, sessioncount, timespent,
 (dim1.cnt * dim2.cnt * dim3.cnt)/(dimall.cnt*dimall.cnt) as expected
 FROM (SELECT sessionnumber, SUM(cast(cnt as float)) as cnt
 FROM d3
 GROUP BY sessionnumber) dim1 CROSS JOIN
 (SELECT sessioncount, SUM(cast(cnt as float)) as cnt
 FROM d3
 GROUP BY sessioncount) dim2 CROSS JOIN
 (SELECT timespent, SUM(cast(cnt as float)) as cnt
 FROM d3
 GROUP BY timespent) dim3 CROSS JOIN
 (SELECT SUM(cast(cnt as float)) as cnt FROM d3) dimall;

float具有16位精度,因此它应该足以计算已知宇宙中任何合理数量的对象。