Question

我需要查找表中仅包含某些日语UTF-8字符的所有条目。

例如，我希望所有字段仅包含1（一）和2（二）。

我正在使用

SELECT combi_id, keb FROM combi WHERE keb REGEXP '[二一]+'

但它匹配包含不同字符的许多其他字段，我做错了什么？

这是表格：

CREATE TABLE IF NOT EXISTS `combi` (
      `combi_id` int(11) NOT NULL auto_increment,
      `ent_seq` int(11) NOT NULL,
      `reb` text NOT NULL,
      `keb` text NOT NULL,
      `ant` text NOT NULL,
      `ke_pri` text NOT NULL,
      `re_pri` text NOT NULL,
      `re_restr` text NOT NULL,
      `stagr` text NOT NULL,
      `s_inf` text NOT NULL,
      `lsource` text NOT NULL,
      `gloss` text NOT NULL,
      `xref` text NOT NULL,
      `stagk` text NOT NULL,
      PRIMARY KEY  (`combi_id`)
    ) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=146740 ;

这是一个数据样本行：

(22, 1000225, 'あからさま', '明白|偸閑|白地', '', '', '', '', '', '', '', 'plain|frank|candid|open|direct|straightforward|unabashed|blatant|flagrant', '', ''),

非常感谢你的帮助！

Answer 1

如果要匹配仅这些字符的列，则应使用

SELECT combi_id, keb FROM combi WHERE keb REGEXP '^[二一]+$'

注意开头的^和最后的$，分别表示“字符串的开头”和“字符串的结尾”。没有这些，正则表达式可以在任何位置匹配。

编辑：测试

mysql> select * from test;
+--------+
| f1     |
+--------+
| 二     |
| 東京   |
| 人     |
| 丸     |
+--------+
4 rows in set (0.00 sec)

mysql> select * from test where f1 regexp _utf8'[一二]';
+--------+
| f1     |
+--------+
| 二     |
| 東京   |
| 人     |
| 丸     |
+--------+
4 rows in set (0.00 sec)

哇，确实，mysql regexps中的字符类听起来很严重...... 但这很有效：

mysql> select * from test where f1 regexp _utf8'(一|二)';
+------+
| f1   |
+------+
| 二   |
+------+
1 row in set (0.00 sec)

在MySQL中查找日语匹配的REGEX

1 个答案: