我有一个搜索表单。如果用户输入类似ager
而非anger
的拼写错误,则仍应显示相关结果,而不是显示找到的0结果。
我遇到了PHP levenshtein function,他们给出的数组的例子正是我想要的[除了用户可以输入一个句子而不是一个单词],但我想用数据库实现它,但不知道如何用数据库实现它。
这是我的代码:
if(!empty($search))
{
try {
$query = $this->_db->prepare($sql);
$query->execute();
if(!$query->rowCount()==0)
{
$foundRows = $this->_db->query("SELECT FOUND_ROWS()")->fetchColumn();
while($row = $query->fetch(PDO::FETCH_ASSOC))
{
$cQuote = $this->highlightWords(htmlspecialchars($row['cQuotes']),$search);
$search_result[] = array('success' => true, 'totalRows' => $foundRows, 'cQuotes' => $cQuote, 'vAuthor' => $this->h($row['vAuthor']), 'vBookName' => $this->h($row['vBookName']), 'vRef' => $this->h($row['vRef']));
}
$response = json_encode($search_result);
echo $response;
return TRUE;
}
else
{
$ex = "No results found for " .$search;
$this->errorMsg($ex);
}
$query->closeCursor();
}
catch (Exception $ex){
$ex = "Problem: " .$ex;
$this->errorMsg($ex);
}
}
else
{
$ex = "Please enter something";
$this->errorMsg($ex);
}
我应该补充说我正在使用MySQL + PDO。
答案 0 :(得分:1)
要实现这一目标,您需要做三件事:
LEFT JOIN
和HAVING
子句示例数据库架构:
文字强>
+---------+----------------------------------------------+
| text_id | text |
+---------+----------------------------------------------+
| 1 | The quick brown fox jumps over the lazy dog |
| 2 | The slow brown foxes jump over the lazy dogs |
+---------+----------------------------------------------+
字符强>
+-------+---------+
| word | text_id |
+-------+---------+
| fox | 1 |
| foxes | 2 |
| dog | 1 |
| dogs | 2 |
+-------+---------+
有了这个,说有人搜索“foxs dogg
”,你就会建立一个像这样的查询:
SELECT text FROM text
LEFT JOIN word w1 ON w1.text_id = text.text_id AND LEVENSHTEIN(w1.word, "foxs") < 3
LEFT JOIN word w2 ON w2.text_id = text.text_id AND LEVENSHTEIN(w2.word, "dogg") < 3
GROUP BY text.text_id
HAVING COUNT(*) = 2
...其中:
LEFT JOIN
(例如:foxs
和dogg
)HAVING
子句包含总字数(例如:HAVING COUNT(*) = 2
)LEVENSHTEIN(...) < 3
)以上将返回两个条目。
这是另一个例子:
SELECT text FROM text
LEFT JOIN word w1 ON w1.text_id = text.text_id AND LEVENSHTEIN(w1.word, "foxs") < 3
LEFT JOIN word w2 ON w2.text_id = text.text_id AND LEVENSHTEIN(w2.word, "slows") < 3
GROUP BY text.text_id
HAVING COUNT(*) = 2
上述内容仅返回text_id = 2
。
现在,在您疯狂实现之前,您应该知道在具有数百万个条目(单词)的表上,多个JOIN子句(如上所述)将会产生非常大的性能影响。
虽然这是一个有效的示例,但您确实应该寻找已经实现的搜索算法,例如Solr's SpellCheck组件。