我试图构建一个2x2的contigency表,如下面的链接所示:
Ad hoc 2x2 contingency tables SQL Server 2008 (试图理解代码,但无法绕过它)
进行循环以构建如C1,C1 C1,C2 C2,C1 C2,C2中的对。 (笛卡尔积)
这些对作为参数提供给sql代码。对于这个例子,我已经给了一对sql代码 - > C1,C1
当为不同的对构建它们时,它们是正确的,如C1,C2 C2,C1(在下面的一些修改之后)。当成对C1,C1或C2,C2时,它构造了一个错误的列联表。
例如(表名是alpha_occurence):
id concept_uri document_uri
1 C1 D1
2 C2 D1
2x2对C1,C1的列联表应从上面给出的表中给出:
C1 not C1
C1 1 0
not C1 0 -
但反过来(经过一些修改):
C1 not C1
C1 0 1
not C1 1 -
注意我已经为 - 而不是C1,而不是C1。因为要计算使用其他方法。
此sql代码用于检索值:
SELECT count(*) AS total FROM
(SELECT document_uri,count(DISTINCT concept_uri) AS count_conc FROM mydb.alpha_occurence
WHERE concept_uri IN ('C1','C1')
GROUP BY document_uri
HAVING count_conc >=2 )
AS amount_of_concept_co_occurence #value of both X and Y
UNION ALL
SELECT count(*) AS total FROM
(SELECT concept_uri,document_uri FROM mydb.alpha_occurence
WHERE concept_uri IN ('C1'))
AS only_concept_A #value of Only X not Y
UNION ALL
SELECT count(*) AS total FROM
(SELECT concept_uri,document_uri FROM mydb.alpha_occurence
WHERE concept_uri IN ('C1'))
AS only_concept_B #value of Not X only Y
检索到值后,会在这些值上运行一个小脚本来纠正它们。 完成以下操作:
To get Only X and not Y = only_concept_A - amount_of_concept_co_occurence
To get Not X and Only Y = Only_concept_B - amount_of_concept_co_occurence
To get the value of neither X or Y = total # of documents (which is not given here as the sample data only has data of which concept occurce in which document) - (amount_of_concept_co_occurence + Only X and not Y + Not X and Only Y)
答案 0 :(得分:1)
我用过这个脚本
select concept_uri, document_uri, count(*) as count
from table
group by concept_uri, document_uri
他们准备好了......