tf-idf出错

时间:2012-12-11 21:08:36

标签: php mysql algorithm tf-idf

嗨我有tf-idf的问题。代码显示我:“0”

这是代码:

$terms = array_count_values( explode( ' ', $frase ) );
$total_term = asort( $terms );
$total_array = count($total_term);

for ($i=1; $i<=$total_array; $i++){
$SQL = mysql_query("SELECT webTitulo, webDescripcion, webkeywords, weburl FROM webs WHERE MATCH (webTitulo, webDescripcion, webkeywords, weburl) AGAINST ('$total_term[$i]')", $server_link) or die(mysql_error());
$frec_term = mysql_num_rows($SQL);
}
$sssql = mysql_query("SELECT uDR.webTitulo, uDR.webDescripcion, uDR.webkeywords, uDR.weburl, SUM(uDR.priority) as SPriority
FROM (

(SELECT s1.webTitulo, s1.webDescripcion, s1.weburl, s1.webkeywords, 3 as priority FROM webs s1 WHERE MATCH (webTitulo) AGAINST ('$frase'))

UNION

(SELECT s2.webTitulo, s2.webDescripcion, s2.weburl, s2.webkeywords, 1 as priority FROM webs s2 WHERE MATCH (webkeywords) AGAINST ('$frase'))

UNION

(SELECT s3.webTitulo, s3.webDescripcion, s3.weburl, s3.webkeywords, 2 as priority FROM webs s3 WHERE MATCH (webDescripcion) AGAINST ('$frase'))) uDR

GROUP BY uDR.webTitulo, uDR.weburl, uDR.webDescripcion, uDR.webkeywords

ORDER BY SPriority DESC ", $server_link) 
                         or die(mysql_error()); 
$totalRows = mysql_num_rows($sssql);
$tf_idf = $frec_term * log10($totalRows/70);
echo $tf_idf;

70是替换不存在的变量的数字。

问候

1 个答案:

答案 0 :(得分:0)

您的逻辑是错误的:如果您将$frase替换为句子,则最终$total_termtrue而不是数组,请参阅this example

即使$total是一个数组,由于array_count_values,键也不是数字而是数字。

即使他们的密钥是数字的,它们也可能是从零开始的,这意味着你的上一个$frec_term = mysql_num_rows($SQL);将评估为0

因此$tf_idf0的原因有很多,而且这些原因都来自$ frec_term为NULL / undefined / 0这一事实。