MySQL子串模糊搜索

时间:2016-08-25 12:53:51

标签: mysql fuzzy-search

我有一个非常有趣的问题:

我有一个MySQL表'Venue',其中包含字段:'name','addressLine1','addressLine2','addressLine3','city','country','description';所有字段都是VARCHAR。 'description'是一个较大的文本字段。

我想做的是在桌子上进行模糊搜索。到目前为止我正在使用:

SELECT * FROM Venue WHERE MATCH(name, addressLine1,..., description) AGAINST("London" IN NATURAL LANGUAGE MODE).

我也可以根据MATCH得分对此查询进行排序。

这很棒,但有一些明显的问题: 1)如果用户键入“lond”,则不返回任何内容 2)如果用户键入“lodnod”,则不返回任何内容。

我想到的一个替代解决方案是使用Levenshtein扩展(2个移动) - 因此对于搜索术语“lodnod”,查询将如下:

SELECT * FROM Venue WHERE Venue. name LIKE '%__lodnod%'
OR Venue.addressLine1 LIKE '%__lodnod%'
OR Venue.addressLine2 LIKE '%__lodnod%'
OR Venue.addressLine3 LIKE '%__lodnod%'
OR Venue.city LIKE '%__lodnod%'
OR Venue.county LIKE '%__lodnod%'
OR Venue.country LIKE '%__lodnod%'
OR Venue. name LIKE '%lodnod%'
OR Venue.addressLine1 LIKE '%lodnod%'
OR Venue.addressLine2 LIKE '%lodnod%'
OR Venue.addressLine3 LIKE '%lodnod%'
OR Venue.city LIKE '%lodnod%'
OR Venue.county LIKE '%lodnod%'
OR Venue.country LIKE '%lodnod%'
OR Venue. name LIKE '%_lodnod%'
OR Venue.addressLine1 LIKE '%_lodnod%'
OR Venue.addressLine2 LIKE '%_lodnod%'
OR Venue.addressLine3 LIKE '%_lodnod%'
OR Venue.city LIKE '%_lodnod%'
OR Venue.county LIKE '%_lodnod%'
OR Venue.country LIKE '%_lodnod%'
OR Venue. name LIKE '%_odnod%'
OR Venue.addressLine1 LIKE '%_odnod%'
OR Venue.addressLine2 LIKE '%_odnod%'
OR Venue.addressLine3 LIKE '%_odnod%'
OR Venue.city LIKE '%_odnod%'
OR Venue.county LIKE '%_odnod%'
OR Venue.country LIKE '%_odnod%'
OR Venue. name LIKE '%__odnod%'
OR Venue.addressLine1 LIKE '%__odnod%'
OR Venue.addressLine2 LIKE '%__odnod%'
OR Venue.addressLine3 LIKE '%__odnod%'
OR Venue.city LIKE '%__odnod%'
OR Venue.county LIKE '%__odnod%'
OR Venue.country LIKE '%__odnod%'
OR Venue. name LIKE '%_l_odnod%'
OR Venue.addressLine1 LIKE '%_l_odnod%'
OR Venue.addressLine2 LIKE '%_l_odnod%'
OR Venue.addressLine3 LIKE '%_l_odnod%'
OR Venue.city LIKE '%_l_odnod%'
OR Venue.county LIKE '%_l_odnod%'
OR Venue.country LIKE '%_l_odnod%'
OR Venue. name LIKE '%_ldnod%'
OR Venue.addressLine1 LIKE '%_ldnod%'
OR Venue.addressLine2 LIKE '%_ldnod%'
OR Venue.addressLine3 LIKE '%_ldnod%'
OR Venue.city LIKE '%_ldnod%'
OR Venue.county LIKE '%_ldnod%'
OR Venue.country LIKE '%_ldnod%'
OR Venue. name LIKE '%_l_dnod%'
OR Venue.addressLine1 LIKE '%_l_dnod%'
OR Venue.addressLine2 LIKE '%_l_dnod%'
OR Venue.addressLine3 LIKE '%_l_dnod%'
OR Venue.city LIKE '%_l_dnod%'
OR Venue.county LIKE '%_l_dnod%'
OR Venue.country LIKE '%_l_dnod%'
OR Venue. name LIKE '%_lo_dnod%'
OR Venue.addressLine1 LIKE '%_lo_dnod%'
OR Venue.addressLine2 LIKE '%_lo_dnod%'
OR Venue.addressLine3 LIKE '%_lo_dnod%'
OR Venue.city LIKE '%_lo_dnod%'
OR Venue.county LIKE '%_lo_dnod%'
OR Venue.country LIKE '%_lo_dnod%'
OR Venue. name LIKE '%_lonod%'
OR Venue.addressLine1 LIKE '%_lonod%'
OR Venue.addressLine2 LIKE '%_lonod%'
OR Venue.addressLine3 LIKE '%_lonod%'
OR Venue.city LIKE '%_lonod%'
OR Venue.county LIKE '%_lonod%'
OR Venue.country LIKE '%_lonod%'
OR Venue. name LIKE '%_lo_nod%'
OR Venue.addressLine1 LIKE '%_lo_nod%'
OR Venue.addressLine2 LIKE '%_lo_nod%'
OR Venue.addressLine3 LIKE '%_lo_nod%'
OR Venue.city LIKE '%_lo_nod%'
OR Venue.county LIKE '%_lo_nod%'
OR Venue.country LIKE '%_lo_nod%'
OR Venue. name LIKE '%_lod_nod%'
OR Venue.addressLine1 LIKE '%_lod_nod%'
OR Venue.addressLine2 LIKE '%_lod_nod%'
OR Venue.addressLine3 LIKE '%_lod_nod%'
OR Venue.city LIKE '%_lod_nod%'
OR Venue.county LIKE '%_lod_nod%'
OR Venue.country LIKE '%_lod_nod%'
OR Venue. name LIKE '%_lodod%'
OR Venue.addressLine1 LIKE '%_lodod%'
OR Venue.addressLine2 LIKE '%_lodod%'
OR Venue.addressLine3 LIKE '%_lodod%'
OR Venue.city LIKE '%_lodod%'
OR Venue.county LIKE '%_lodod%'
OR Venue.country LIKE '%_lodod%'
OR Venue. name LIKE '%_lod_od%'
OR Venue.addressLine1 LIKE '%_lod_od%'
OR Venue.addressLine2 LIKE '%_lod_od%'
OR Venue.addressLine3 LIKE '%_lod_od%'
OR Venue.city LIKE '%_lod_od%'
OR Venue.county LIKE '%_lod_od%'
OR Venue.country LIKE '%_lod_od%'
OR Venue. name LIKE '%_lodn_od%'
OR Venue.addressLine1 LIKE '%_lodn_od%'
OR Venue.addressLine2 LIKE '%_lodn_od%'
OR Venue.addressLine3 LIKE '%_lodn_od%'
OR Venue.city LIKE '%_lodn_od%'
OR Venue.county LIKE '%_lodn_od%'
OR Venue.country LIKE '%_lodn_od%'
OR Venue. name LIKE '%_lodnd%'
OR Venue.addressLine1 LIKE '%_lodnd%'
OR Venue.addressLine2 LIKE '%_lodnd%'
OR Venue.addressLine3 LIKE '%_lodnd%'
OR Venue.city LIKE '%_lodnd%'
OR Venue.county LIKE '%_lodnd%'
OR Venue.country LIKE '%_lodnd%'
OR Venue. name LIKE '%_lodn_d%'
OR Venue.addressLine1 LIKE '%_lodn_d%'
OR Venue.addressLine2 LIKE '%_lodn_d%'
OR Venue.addressLine3 LIKE '%_lodn_d%'
OR Venue.city LIKE '%_lodn_d%'
OR Venue.county LIKE '%_lodn_d%'
OR Venue.country LIKE '%_lodn_d%'
OR Venue. name LIKE '%_lodno_d%'
OR Venue.addressLine1 LIKE '%_lodno_d%'
OR Venue.addressLine2 LIKE '%_lodno_d%'
OR Venue.addressLine3 LIKE '%_lodno_d%'
OR Venue.city LIKE '%_lodno_d%'
OR Venue.county LIKE '%_lodno_d%'
OR Venue.country LIKE '%_lodno_d%'
OR Venue. name LIKE '%_lodno%'
OR Venue.addressLine1 LIKE '%_lodno%'
OR Venue.addressLine2 LIKE '%_lodno%'
OR Venue.addressLine3 LIKE '%_lodno%'
OR Venue.city LIKE '%_lodno%'
OR Venue.county LIKE '%_lodno%'
OR Venue.country LIKE '%_lodno%'
OR Venue. name LIKE '%_lodno_%'
OR Venue.addressLine1 LIKE '%_lodno_%'
OR Venue.addressLine2 LIKE '%_lodno_%'
OR Venue.addressLine3 LIKE '%_lodno_%'
OR Venue.city LIKE '%_lodno_%'
OR Venue.county LIKE '%_lodno_%'
OR Venue.country LIKE '%_lodno_%'
OR Venue. name LIKE '%_lodnod_%'
OR Venue.addressLine1 LIKE '%_lodnod_%'
OR Venue.addressLine2 LIKE '%_lodnod_%'
OR Venue.addressLine3 LIKE '%_lodnod_%'
OR Venue.city LIKE '%_lodnod_%'
OR Venue.county LIKE '%_lodnod_%'
OR Venue.country LIKE '%_lodnod_%'
OR Venue. name LIKE '%dnod%'
OR Venue.addressLine1 LIKE '%dnod%'
OR Venue.addressLine2 LIKE '%dnod%'
OR Venue.addressLine3 LIKE '%dnod%'
OR Venue.city LIKE '%dnod%'
OR Venue.county LIKE '%dnod%'
OR Venue.country LIKE '%dnod%'
OR Venue. name LIKE '%_dnod%'
OR Venue.addressLine1 LIKE '%_dnod%'
OR Venue.addressLine2 LIKE '%_dnod%'
OR Venue.addressLine3 LIKE '%_dnod%'
OR Venue.city LIKE '%_dnod%'
OR Venue.county LIKE '%_dnod%'
OR Venue.country LIKE '%_dnod%'
OR Venue. name LIKE '%o_dnod%'
OR Venue.addressLine1 LIKE '%o_dnod%'
OR Venue.addressLine2 LIKE '%o_dnod%'
OR Venue.addressLine3 LIKE '%o_dnod%'
OR Venue.city LIKE '%o_dnod%'
OR Venue.county LIKE '%o_dnod%'
OR Venue.country LIKE '%o_dnod%'
OR Venue. name LIKE '%onod%'
OR Venue.addressLine1 LIKE '%onod%'
OR Venue.addressLine2 LIKE '%onod%'
OR Venue.addressLine3 LIKE '%onod%'
OR Venue.city LIKE '%onod%'
OR Venue.county LIKE '%onod%'
OR Venue.country LIKE '%onod%'
OR Venue. name LIKE '%o_nod%'
OR Venue.addressLine1 LIKE '%o_nod%'
OR Venue.addressLine2 LIKE '%o_nod%'
OR Venue.addressLine3 LIKE '%o_nod%'
OR Venue.city LIKE '%o_nod%'
OR Venue.county LIKE '%o_nod%'
OR Venue.country LIKE '%o_nod%'
OR Venue. name LIKE '%od_nod%'
OR Venue.addressLine1 LIKE '%od_nod%'
OR Venue.addressLine2 LIKE '%od_nod%'
OR Venue.addressLine3 LIKE '%od_nod%'
OR Venue.city LIKE '%od_nod%'
OR Venue.county LIKE '%od_nod%'
OR Venue.country LIKE '%od_nod%'
OR Venue. name LIKE '%odod%'
OR Venue.addressLine1 LIKE '%odod%'
OR Venue.addressLine2 LIKE '%odod%'
OR Venue.addressLine3 LIKE '%odod%'
OR Venue.city LIKE '%odod%'
OR Venue.county LIKE '%odod%'
OR Venue.country LIKE '%odod%'
OR Venue. name LIKE '%od_od%'
OR Venue.addressLine1 LIKE '%od_od%'
OR Venue.addressLine2 LIKE '%od_od%'
OR Venue.addressLine3 LIKE '%od_od%'
OR Venue.city LIKE '%od_od%'
OR Venue.county LIKE '%od_od%'
OR Venue.country LIKE '%od_od%'
OR Venue. name LIKE '%odn_od%'
OR Venue.addressLine1 LIKE '%odn_od%'
OR Venue.addressLine2 LIKE '%odn_od%'
OR Venue.addressLine3 LIKE '%odn_od%'
OR Venue.city LIKE '%odn_od%'
OR Venue.county LIKE '%odn_od%'
OR Venue.country LIKE '%odn_od%'
OR Venue. name LIKE '%odnd%'
OR Venue.addressLine1 LIKE '%odnd%'
OR Venue.addressLine2 LIKE '%odnd%'
OR Venue.addressLine3 LIKE '%odnd%'
OR Venue.city LIKE '%odnd%'
OR Venue.county LIKE '%odnd%'
OR Venue.country LIKE '%odnd%'
OR Venue. name LIKE '%odn_d%'
OR Venue.addressLine1 LIKE '%odn_d%'
OR Venue.addressLine2 LIKE '%odn_d%'
...(cut short because of maximum 30000 character limit)
OR Venue.city LIKE '%lodno_%'
OR Venue.county LIKE '%lodno_%'
OR Venue.country LIKE '%lodno_%'
OR Venue. name LIKE '%lodno__%'
OR Venue.addressLine1 LIKE '%lodno__%'
OR Venue.addressLine2 LIKE '%lodno__%'
OR Venue.addressLine3 LIKE '%lodno__%'
OR Venue.city LIKE '%lodno__%'
OR Venue.county LIKE '%lodno__%'
OR Venue.country LIKE '%lodno__%'
OR Venue. name LIKE '%lodno_d_%'
OR Venue.addressLine1 LIKE '%lodno_d_%'
OR Venue.addressLine2 LIKE '%lodno_d_%'
OR Venue.addressLine3 LIKE '%lodno_d_%'
OR Venue.city LIKE '%lodno_d_%'
OR Venue.county LIKE '%lodno_d_%'
OR Venue.country LIKE '%lodno_d_%'
OR Venue. name LIKE '%lodno%'
OR Venue.addressLine1 LIKE '%lodno%'
OR Venue.addressLine2 LIKE '%lodno%'
OR Venue.addressLine3 LIKE '%lodno%'
OR Venue.city LIKE '%lodno%'
OR Venue.county LIKE '%lodno%'
OR Venue.country LIKE '%lodno%'
OR Venue. name LIKE '%lodnod__%'
OR Venue.addressLine1 LIKE '%lodnod__%'
OR Venue.addressLine2 LIKE '%lodnod__%'
OR Venue.addressLine3 LIKE '%lodnod__%'
OR Venue.city LIKE '%lodnod__%'
OR Venue.county LIKE '%lodnod__%'
OR Venue.country LIKE '%lodnod__%'
OR Venue. name LIKE '%lodnod_%'
OR Venue.addressLine1 LIKE '%lodnod_%'
OR Venue.addressLine2 LIKE '%lodnod_%'
OR Venue.addressLine3 LIKE '%lodnod_%'
OR Venue.city LIKE '%lodnod_%'
OR Venue.county LIKE '%lodnod_%'
OR Venue.country LIKE '%lodnod_%'

显然,这个查询是 1)将成为服务器上的巨大负载,响应时间将受到影响。 2)我不知道对此进行评分并因此对其进行排序。

有没有办法更好地做到这一点?

2 个答案:

答案 0 :(得分:1)

不要使用mysql这个东西。您需要一个针对搜索材料编制索引而优化的引擎。您可以使用elasticsearchsolr来实现此目的,并使用mysql对其进行索引。 Mysql根本就不是为此而设计的

答案 1 :(得分:0)

MySQL's documentation page for full text search functions有一些有用的信息。尽管上面的评论说明了,但您不需要将自己局限于MySQL中的MyISAM表类型。正如其他人所说,MySQL可能不是最好的选择,但是你可以采取一些措施使其更加可行。

列聚合

如果您发现总是对多个列执行此类搜索,则可能会有一个“可搜索”列,其中只包含来自所有其他列的标记。您需要一些机制来使其保持最新,但ON INSERTON UPDATE触发器就足够了。这可以简化索引并使您的查询更易于阅读(并且希望加速响应),但不能解决您的初始问题。

这种“可搜索”专栏创意的一个优点是,您可以安静地添加常见拼写错误的标记,以方便用户使用。您的搜索日志在这里很有帮助。

考虑使用Soundex

最近取得成功的一个解决方案是使用soundex而不是Levenshtein扩展。参见,例如,Rob Gravelle(2015)"MySQL Fuzzy Text Searching Using the SOUNDEX Function数据库期刊

用户定义函数(UDF)

另一个选择是在MySQL实现中添加用户定义函数(UDF)。例如,请this Stack Overflow questionthis blog post

然而,你接近它,你会发现Levenshtein的真正模糊搜索是一个缓慢的野兽。祝你好运!