计算sqlite中特征的出现次数

时间:2014-10-01 23:53:35

标签: sql sqlite

我有一个单词集及其频率的数据集,例如

w1  w2  w3   freq
a   a   a    4
a   a   and  3
a   a   band 1
a   a   well 1
a   and a    2

我想根据下表得出观察结果:

            (w3)   not(w3)
(w1,w2)      n1     n2
not(w1,w2)   n3     n4

其中n1,...,n4是满足条件的观测频率之和。例如,在第一次观察中,w1 = a,w2 = a,w3 = a。我们现在将检查所有观察结果,其中w1 = a,w2 = a,w3 = a。我们只发现一个观察符合该标准并且其频率为4.接着我们做w1 = a,w2 = a,w3!= a并且给出了频率为3,1,1且总和为5的观测值。现在我们将做w1!= a,w2!= a,w3 = a为0且w1!= a,w2!= a,w3!= a为0。

我想要一个表格输出为:

w1  w2  w3   freq  n1  n2  n3  n4
a   a   a    4     4   5   0   0
a   a   and  3     3   6   0   0
a   a   band 1
a   a   well 1
a   and a    2
etc.

如何使用sqlite3实现此目的?

1 个答案:

答案 0 :(得分:1)

这可以通过相关的标量子查询来完成:

SELECT w1,
       w2,
       w3,
       freq,
       (SELECT SUM(freq)
        FROM MyLittleTable AS T2
        WHERE T2.w1 = T1.w1
          AND T2.w2 = T1.w2
          AND T2.w3 = T1.w3
       ) AS n1,
       (SELECT SUM(freq)
        FROM MyLittleTable AS T2
        WHERE T2.w1  = T1.w1
          AND T2.w2  = T1.w2
          AND T2.w3 != T1.w3
       ) AS n2,
       ...
FROM MyLittleTable AS T1