我有一个搜索引擎,扫描给定网页中的所有单词,然后显示它们的出现。然后按照该单词在文档中出现的出现量进行排名。但它不会返回多个术语查询。
下面是我的SQL查询。我希望能够检查所有输入的单词,然后根据单词出现在文档中的次数进行排名。它目前只适用于单期查询。
$result = mysql_query(" SELECT p.page_url AS url,
COUNT(*) AS occurrences
FROM page p, word w, occurrence o
WHERE p.page_id = o.page_id AND
w.word_id = o.word_id AND
w.word_word = \"$keyword\"
GROUP BY p.page_id
ORDER BY occurrences DESC
LIMIT $results" );
答案 0 :(得分:1)
如果您想获得所有单词,那么您的加入条件将不允许您这样做
w.word_word = \"$keyword\"
您的查询可以写成如下
$sql = "SELECT p.page_url as url, COUNT(*) as occurences "
. "FROM page p "
. "INNER JOIN occurence o ON p.page_id = o.page_id "
. "INNER JOIN word w ON w.word_id = o.word_id "
. "GROUP BY p.page_id "
. "ORDER BY occurences DESC "
. "LIMIT {$results}";
$result = mysql_query($sql);
这将获取word
表中的所有单词,从而为您提供(据我所知)需要的结果。
如果您对几个单词感兴趣,那么您可以使用IN
语句(在评论中由Dev建议),您的查询将变为:
$my_keywords = array('apple', 'banana');
// This produces: "apple", "banana" and assumes that all of your
// keywords are in lower case. If not, you can transform them to lower
// case or if you don't want that, remove the LOWER() function below
// from the WHERE
$keywords = '"' . implode('","', $my_keywords) . '"';
$sql = "SELECT p.page_url as url, COUNT(*) as occurences "
. "FROM page p "
. "INNER JOIN occurence o ON p.page_id = o.page_id "
. "INNER JOIN word w ON w.word_id = o.word_id "
. "WHERE LOWER(w.word_word) IN ({$keywords}) "
. "GROUP BY p.page_id "
. "ORDER BY occurences DESC "
. "LIMIT {$results}";
$result = mysql_query($sql);
最后,尝试使用mysqli
代替mysql
或PDO。
HTH
答案 1 :(得分:1)
我将使用MATCH-AGAINST,这对于像搜索引擎这样的MySQL优化搜索应该更好。您应该查看全文搜索:http://dev.mysql.com/doc/refman/5.5/en//fulltext-search.html
注意:在MySQL表中,应该在数据库表中将其作为关键字行的FULLTEXT进行索引。 这将为搜索提供更好的表现。
示例:
输入关键字示例:
$ keywords ='+ Word + Word2 + Word3';
SELECT p.page_url AS url,
COUNT(*) AS occurrences, MATCH('w.word_word') AGAINST ('$keywords') as keyword FROM page p, occurrence o, w.word WHERE MATCH
('w.word_word') AGAINST('{$keywords}' IN
BOOLEAN MODE)
AND p.page_id = o.page_id AND w.word_id = o.word_id
GROUP BY p.page_id
ORDER BY occurrences DESC
LIMIT $results
在其他非优化模式下,如果您的查询未被优化,则会降低性能服务器的风险(太多组,其中包含子句和条件)。而不是这个,你可以在MySQL中使用REGULAR EXPRESSION,例如:
REGEXP "/(honda)|(jazz)|(manual)/"
使用正则表达式(不推荐用于大型数据库)也可以获得良好的性能:
制作循环并计算它而不是放入REGEXP:
$keywords = "keyword1,keyword2,keyword3";
$expl = explode("," $keywords);
if (count($expl) == 1)
{
$all = w.word_word REGEXP = '[[:<:]]$keywords[[:>:]]';
}
else
{
$all = '';
foreach ($expl as $keyone)
{
$all .= 'OR '.w.word_word REGEXP = '[[:<:]]$keyone[[:>:]]';
}
}
$sql = 'SELECT p.page_url AS url,
COUNT(*) AS occurrences
FROM page p, word w, occurrence o
WHERE p.page_id = o.page_id AND
w.word_id = o.word_id AND
$all
GROUP BY p.page_id
ORDER BY occurrences DESC
LIMIT $results';
$result_query = mysql_query($sql);