我想算idf,公式为IDF=log(D/df)
,D
是总数据,df
是包含搜索词的许多数据。
从表中:
1. tb_stemming
===========================================================================
|stem_id | stem_before | stem_after | stem_freq | sentence_id |document_id|
===========================================================================
| 1 | Data | Data | 1 | 0 | 1 |
| 2 | Discuss | Discuss | 1 | 1 | 1 |
| 3 | Mining | Min | 1 | 0 | 2 |
===========================================================================
这是代码:
countIDF($total_sentence,$doc_id);
$total_sentence
是
Array ( [0] => 644 [1] => 79 [2] => 264 [3] => 441 [4] => 502 [5] => 18 [6] => 352 [7] => 219 [8] => 219 )
function countIDF($total_sentence, $doc_id) {
foreach ($total_sentence as $doc_id => $total_sentences){
$idf = 0;
$query1 = mysql_query("SELECT document_id, DISTINCT(stem_after) AS unique_token FROM tb_stemming group by stem_after where document_id='$doc_id' ' ");
while ($row = mysql_fetch_array($query)) {
$token = $row['unique_token'];
$doc_id = $row['document_id'];
$ndw = countNDW($token);
$idf = log($total_sentences / $ndw)+1;
$q = mysql_query("INSERT INTO tb_idf VALUES ('','$doc_id','$token','$ndw','$idf') ");
}
}
}
并且countNDW的功能是:
function countNDW ($word) {
$query = mysql_query("SELECT stem_after, COUNT( DISTINCT sentence_id ) AS ndw FROM `tb_stemming` WHERE stem_after = '$word' GROUP BY stem_after");
while ($row = mysql_fetch_array($query)) {
$ndw = $row['ndw'];
}
return $ndw;
}
它无法正常工作,特别是在数据库调用时。我所需要的只是计入每个document_id
。如何在我的代码中定义它?拜托,帮帮我..非常感谢你:)。