用于搜索给定字符串的DB的良好算法

时间:2015-04-06 08:14:55

标签: mysql database search text

我正在开发一个网络应用程序(PHP + MySQL),用户可以通过输入一些搜索字符串来搜索其他用户。

我需要将用户的输入字符串与我的数据库中“User”表的2列(username和fullName)相匹配,并返回最相关(20或50)的匹配。最理想的是,我还需要考虑拼写错误。

我该如何处理?我不打算在这里重新发明轮子。

2 个答案:

答案 0 :(得分:1)

您可以使用MySQL full Text search执行此操作:

请查看thisthisthis条。

我想解释一下Boolean Full Text Search;但我建议你也请通过Full Text Search using Query Expansion

让我们看看dev.mysql.com上给出的示例表:

mysql> select * from articles;
+----+-----------------------+------------------------------------------+
| id | title                 | body                                     |
+----+-----------------------+------------------------------------------+
|  1 | PostgreSQL Tutorial   | DBMS stands for DataBase ...             |
|  2 | How To Use MySQL Well | After you went through a ...             |
|  3 | Optimizing MySQL      | In this tutorial we will show ...        |
|  4 | 1001 MySQL Tricks     | 1. Never run mysqld as root. 2. ...      |
|  5 | MySQL vs. YourSQL     | In the following database comparison ... |
|  6 | MySQL Security        | When configured properly, MySQL ...      |
+----+-----------------------+------------------------------------------+

mysql> SELECT * FROM articles WHERE MATCH (title,body)
     AGAINST ('"database comparison"' IN BOOLEAN MODE);

+----+-------------------+------------------------------------------+
| id | title             | body                                     |
+----+-------------------+------------------------------------------+
|  5 | MySQL vs. YourSQL | In the following database comparison ... |
+----+-------------------+------------------------------------------+

当引用单词时,顺序很重要:

mysql> SELECT * FROM articles WHERE MATCH (title,body)
     AGAINST ('"comparison database"' IN BOOLEAN MODE);

Empty set (0.01 sec)

当我们删除引号时,它会搜索包含“database”或“comparison”字样的行:

mysql> SELECT * FROM articles WHERE MATCH (title,body)
     AGAINST ('database comparison' IN BOOLEAN MODE);

+----+---------------------+------------------------------------------+
| id | title               | body                                     |
+----+---------------------+------------------------------------------+
|  1 | PostgreSQL Tutorial | DBMS stands for DataBase ...             |
|  5 | MySQL vs. YourSQL   | In the following database comparison ... |
+----+---------------------+------------------------------------------+

现在订单无关紧要:

mysql> SELECT * FROM articles WHERE MATCH (title,body)
     AGAINST ('comparison database' IN BOOLEAN MODE);

+----+---------------------+------------------------------------------+
| id | title               | body                                     |
+----+---------------------+------------------------------------------+
|  1 | PostgreSQL Tutorial | DBMS stands for DataBase ...             |
|  5 | MySQL vs. YourSQL   | In the following database comparison ... |
+----+---------------------+------------------------------------------+

如果我们想要获取包含单词“PostgreSQL”或短语“数据库比较”的行,我们应该使用此请求:

mysql> SELECT * FROM articles WHERE MATCH (title,body)
     AGAINST ('PostgreSQL "database comparison"' IN BOOLEAN MODE);

+----+---------------------+------------------------------------------+
| id | title               | body                                     |
+----+---------------------+------------------------------------------+
|  1 | PostgreSQL Tutorial | DBMS stands for DataBase ...             |
|  5 | MySQL vs. YourSQL   | In the following database comparison ... |
+----+---------------------+------------------------------------------+

Fiddle To Try

确保您搜索的字词不在list of stopwords中,但会被忽略。
(显然,像'是',''是stopwords这些词被忽略了)

要在布尔模式下增强结果排序,您可以使用以下查询:

(假设你在用户的输入字符串中总共有两个单词)那么。

SELECT column_names, MATCH (text) AGAINST ('word1 word2')
AS col1 FROM table1
WHERE MATCH (text) AGAINST ('+word1 +word2' in boolean mode) 
order by col1 desc;

(如果用户的输入字符串中有3个单词)那么..

SELECT column_names, MATCH (text) AGAINST ('word1 word2 word3')
AS col1 FROM table1
WHERE MATCH (text) AGAINST ('+word1 +word2 +word3' in boolean mode) 
order by col1 desc;

使用第一个 MATCH() ,我们会在非布尔搜索模式下获得分数(更有特色)第二 MATCH() 可确保我们确实只返回我们想要的结果(包含所有3个字)

答案 1 :(得分:-1)

我会像搜索两列一样

我的查询就像这样

select * from(select * from user where(username like ='%{input}%'或lastName like'%{input}%'))和rownum = 20

您需要编写自定义清理算法来清除拼写错误输入