在sql中构造2x2列联表

时间:2014-06-05 10:45:34

标签: mysql sql contingency

我试图构建一个2x2的contigency表,如下面的链接所示:

Ad hoc 2x2 contingency tables SQL Server 2008 (试图理解代码,但无法绕过它)

进行循环以构建如C1,C1 C1,C2 C2,C1 C2,C2中的对。 (笛卡尔积)

这些对作为参数提供给sql代码。对于这个例子,我已经给了一对sql代码 - > C1,C1

当为不同的对构建它们时,它们是正确的,如C1,C2 C2,C1(在下面的一些修改之后)。当成对C1,C1或C2,C2时,它构造了一个错误的列联表。

例如(表名是alpha_occurence):

id   concept_uri   document_uri

1       C1      D1

2       C2      D1

2x2对C1,C1的列联表应从上面给出的表中给出:

       C1     not C1
    C1  1     0
not C1  0     -

但反过来(经过一些修改):

       C1    not C1
    C1  0    1
not C1  1    -

注意我已经为 - 而不是C1,而不是C1。因为要计算使用其他方法。

此sql代码用于检索值:

SELECT count(*) AS total FROM  
(SELECT document_uri,count(DISTINCT concept_uri) AS count_conc FROM mydb.alpha_occurence 
WHERE concept_uri IN ('C1','C1') 
GROUP BY document_uri 
HAVING count_conc >=2 ) 
AS amount_of_concept_co_occurence #value of both X and Y

UNION ALL 

SELECT count(*) AS total FROM 
(SELECT concept_uri,document_uri FROM mydb.alpha_occurence
WHERE concept_uri IN ('C1'))
AS only_concept_A #value of Only X not Y

UNION ALL 

SELECT count(*) AS total FROM
(SELECT concept_uri,document_uri FROM mydb.alpha_occurence 
WHERE concept_uri IN ('C1'))
AS only_concept_B #value of Not X only Y

检索到值后,会在这些值上运行一个小脚本来纠正它们。 完成以下操作:

To get Only X and not Y            = only_concept_A - amount_of_concept_co_occurence 
To get Not X and Only Y            = Only_concept_B - amount_of_concept_co_occurence
To get the value of neither X or Y = total # of documents (which is not given here as the sample data only has data of which concept occurce in which document) - (amount_of_concept_co_occurence + Only X and not Y + Not X and Only Y) 

1 个答案:

答案 0 :(得分:1)

我用过这个脚本

select concept_uri, document_uri, count(*) as count 
from table
group by concept_uri, document_uri

他们准备好了......