如何从PHP / MySQL中的多个内容表中获取最流行的单词。
例如,我有一个论坛帖子的论坛_post;这包含主题和内容。 除此之外,我还有多个具有不同字段的其他表,这些表还可以包含要分析的内容。
我可能会自己去获取所有内容,条带(可能)html在空格上爆炸字符串。删除引号和逗号等,只需在运行所有单词时保存数组,就可以计算不常见的单词。
我的主要问题是,如果有人知道可能更容易或更快的方法。
我似乎无法找到任何有用的答案,这可能是错误的搜索模式。
答案 0 :(得分:3)
您正在寻找的魔法是一个名为str_word_count()的PHP函数。
在下面的示例代码中,如果您从中获得了许多无关的单词,那么您需要编写自定义剥离来删除它们。此外,您还希望从单词和其他字符中删除所有html标记。
我使用类似的东西来生成关键字(显然代码是专有的)。简而言之,我们正在提供提供的文字,我们会检查单词频率,如果单词出现,我们会根据优先级对数组进行排序。所以最频繁的单词将在输出中排在第一位。我们不计算只出现一次的单词。
<?php
$text = "your text.";
//Setup the array for storing word counts
$freqData = array();
foreach( str_word_count( $text, 1 ) as $words ){
// For each word found in the frequency table, increment its value by one
array_key_exists( $words, $freqData ) ? $freqData[ $words ]++ : $freqData[ $words ] = 1;
}
$list = '';
arsort($freqData);
foreach ($freqData as $word=>$count){
if ($count > 2){
$list .= "$word ";
}
}
if (empty($list)){
$list = "Not enough duplicate words for popularity contest.";
}
echo $list;
?>
答案 1 :(得分:0)
我看到你已经接受了答案,但我想给你一个在某种意义上可能更灵活的替代方案:(决定自己:-))我没有测试过代码,但我认为你得到了图片。 $ dbh是一个PDO连接对象。然后,由您决定使用生成的$ words数组。
<?php
$words = array();
$tableName = 'party'; //The name of the table
countWordsFromTable($words, $tableName)
$tableName = 'party2'; //The name of the table
countWordsFromTable($words, $tableName)
//Example output array:
/*
$words['word'][0] = 'happy'; //Happy from table party
$words['wordcount'][0] = 5;
$words['word'][1] = 'bulldog'; //Bulldog from table party2
$words['wordcount'][1] = 15;
$words['word'][2] = 'pokerface'; //Pokerface from table party2
$words['wordcount'][2] = 2;
*/
$maxValues = array_keys($words, max($words)); //Get all keys with indexes of max values of $words-array
$popularIndex = $maxValues[0]; //Get only one value...
$mostPopularWord = $words[$popularIndex];
function countWordsFromTable(&$words, $tableName) {
//Get all fields from specific table
$q = $dbh->prepare("DESCRIBE :tableName");
$q->execute(array(':tableName' = > $tableName));
$tableFields = $q->fetchAll(PDO::FETCH_COLUMN);
//Go through all fields and store count of words and their content in array $words
foreach($tableFields as $dbCol) {
$wordCountQuery = "SELECT :dbCol as word, LENGTH(:dbCol) - LENGTH(REPLACE(:dbCol, ' ', ''))+1 AS wordcount FROM :tableName"; //Get count and the content of words from every column in db
$q = $dbh->prepare($wordCountQuery);
$q->execute(array(':dbCol' = > $dbCol));
$wrds = $q->fetchAll(PDO::FETCH_ASSOC);
//Add result to array $words
foreach($wrds as $w) {
$words['word'][] = $w['word'];
$words['wordcount'][] = $w['wordcount'];
}
}
}
?>