我正在尝试搜索整个表格,并在长字符串中返回最常出现的短语(最多三个单词)。我相信我可以使用全文搜索,但我不能匹配任何东西......
表
I like Iron Man 3 so much
Iron Man 3 sucked alot
Iron Man really saved the day
I like cats
cats are cool
结果
Iron Man
Iron Man 3
cats
查询
SELECT *
FROM table
WHERE substring(text, up to 3 words) OCCURS MOST
ORDER BY OCCURRENCE DESC
答案 0 :(得分:0)
如果你真的是这样,我会说你应该把文本(非sql)解析成一个名为word_list的表,比如
create table phrases (word1 varchar, word2 varchar, word3 varchar, cnt int);
和代码:
$q = query("select comment from comments");
while ($row = array_read_line($q)){
$words = preg_split('/\s/', $row['comment']);
$previous1 = false;
$previous2 = false;
foreach($words as $word){
if($previous1 and $previous2){
.. here comes quoting, security, mysql-injection-safety, min length
query("update relations set cnt = cnt+1 "
. " where word1 = '$previous1', word2 = '$previous2', word3='$word'" )
if (rows_afected == 0){
query("insert into relations "
. " set cnt = 1, word1 = '$previous1', "
. " word2 = '$previous2', word3='$word'" )
}
}
previous1 = $previous2;
$previous2 = $word;
}
}
然后按照计数命令。