MySQL使用排序规则进行字符串比较,因为某些字符应匹配
例:
SELECT 'é' = 'e' COLLATE utf8_unicode_ci;
SELECT 'oe' = 'œ' COLLATE utf8_unicode_ci;
都返回true
现在,我怎么能用引号(')和撇号(')
来做同样的事情这不是同一个角色,在写“it's”或“l'oiseau”(法语)时使用的正确字符都是撇号。
事实是utf8_general_ci或utf8_unicode_ci都没有整理它们。
简单的解决方案是将所有内容存储在引号中,并在用户进行搜索时替换所有撇号,但这是错误的。
真正的解决方案是基于utf8_unicode_ci创建自定义归类并将两者标记为等效,但这需要编辑XML配置文件并重新启动数据库,这并非总是可行。
你会怎么做?
答案 0 :(得分:1)
自定义排序规则似乎是最合适的,但如果不可能,也许您可以定制搜索以使用正则表达式。它并不完全理想,但在某些情况下可能会有用。至少它允许您以正确的格式存储数据(无需替换引号),只需在搜索查询本身上进行替换:
INSERT INTO mytable VALUES
(1, 'Though this be madness, yet there is method in ''t'),
(2, 'Though this be madness, yet there is method in ’t'),
(3, 'There ’s daggers in men’s smiles'),
(4, 'There ’s daggers in men''s smiles');
SELECT * FROM mytable WHERE data REGEXP 'There [\'’]+s daggers in men[\'’]+s smiles';
+----+--------------------------------------+
| id | data |
+----+--------------------------------------+
| 3 | There ’s daggers in men’s smiles |
| 4 | There ’s daggers in men's smiles |
+----+--------------------------------------+
SELECT * FROM mytable WHERE data REGEXP 'Though this be madness, yet there is method in [\'’]+t';
+----+-----------------------------------------------------+
| id | data |
+----+-----------------------------------------------------+
| 1 | Though this be madness, yet there is method in 't |
| 2 | Though this be madness, yet there is method in ’t |
+----+-----------------------------------------------------+