计数数据取决于句子ID和数据频率

时间:2012-09-10 02:16:47

标签: php mysql sql

我有桌子:

=========================================================================
| id | stem_before | stem_after | stem_freq | sentence_id | document_id | 
=========================================================================
|  1 |     a       |     b      |    1      |   0         |       1     |    
|  2 |     c       |     d      |    1      |   0         |       1     |        
|  3 |     e       |     f      |    1      |   1         |       1     |
|  4 |     g       |     h      |    1      |   2         |       1     |
|  5 |     i       |     j      |    2      |   0         |       2     |
|  6 |     k       |     l      |    1      |   0         |       2     |
=========================================================================

我想分2步: 第一步是除以1,每个stem_freqsentence_iddocument_id的值之和。第二步是将第一步的结果与stem_freq

的值相乘

例如:

对于document_id = 1且sentence_id = 0的数据,第一步:1/(1+1) = 0.5,id = 1的第二步是1*0.5 = 0.5。对于id = 2,1*0.5 = 0.5。

对于document_id = 2且sentence_id = 0的数据,第一步:1/(2+1) = 0.3333,id = 5的第二步是2*0.3333 = 0.6666,id = 6是1*0.3333 = 0.3333。

这是我的代码:

$query = mysql_query ("SELECT sentence_id, document_id, stem_after, 
stem_freq,SUM(stem_freq) as freq 
FROM tb_stemming 
WHERE document_id ='$doc_id' 
GROUP BY(sentence_id)");

while ($row = mysql_fetch_array($query)) {
   $a    = $row['freq'];
   $freq = $row['stem_freq'];
   $tf   = $freq/$a;
}

但它只给出了每个不同句子中第一个数据的结果:你能帮助我吗?谢谢你:))

1 个答案:

答案 0 :(得分:1)

试试这个:

SELECT
    a.*, 
    a.stem_freq * b.value
FROM
    tb_stemming as a
    JOIN 
    (
        SELECT
            document_id,
            sentence_id,
            1 / sum(stem_freq) 'value'
        FROM
            tb_stemming
        GROUP BY document_id, sentence_id
    ) as b
    ON a.document_id = b.document_id AND a.sentence_id = b.sentence_id