我有一些数据表明某些数据包含非英文字母,所以我想检查来自任何语言+空格+一些特殊字符的字母。
特殊字符为:' () - &
我尝试了/^[\p{L} -()']+$/
,但它没有使用Castaٌeda
和word Castaٌeda
我希望第一个字符是任何语言字母,然后是所有允许字符的组合。
字符串可以是:
答案 0 :(得分:1)
我希望第一个字符是任何语言字母,然后是所有允许字符的组合。
您应该重新安排当前的正则表达式,要求第一个字母为字母,并且要跟随的字符类应使用*
(零次或多次出现)进行量化。
但是,有一些事情需要注意:
\s
或\h
替换文字空间是有意义的(并使用PHP中的u
修饰符使其识别Unicode ),或将\x{00A0}
模式添加到字符类中以匹配硬空格所以,你可以使用
$pattern = "~^\p{L}[\p{L}\p{M}\h().'&-]*$~u";
请参阅regex demo。
<强>详情
^
- 字符串开头\p{L}
- 任何Unicode字母[\p{L}\p{M}\h().'&-]*
- 零或更多
\p{L}
- 字母\p{M}
- diacritics \h
- 水平空白().'&-
- 这些特定的字符$
- 字符串的结尾(更好的是,添加D
修饰符,或将$
替换为\z
以避免在上一个\n
之前进行匹配。 请参阅the PHP demo:
$arr = ["first-second", "first second", "first'second", "first & second", "first&second", "first(second)", "first (second)", "first-second-third", "first second third", "first second third(fourth)", "first-second-third(fourth)", "word Castaٌeda", "Alfonso Lista (Potia)", "Bacolod-Kalawi (Bacolod-Grande)", "Balindong (Watu)", "President Manuel A. Roxas", "Enrique B. Magalona (Saravia)", "Bacolod-Kalawi (Bacolod-Grande)", "Datu Blah T. Sinsuat", "Don Victoriano Chiongbian (Don Mariano Marcos)", "Bulalacao (San Pedro)", "Hinoba-an (Asia)"];
$pattern = "~^\p{L}[\p{L}\p{M}\h().'&-]*$~u";
foreach ($arr as $s) {
echo $s;
if (preg_match($pattern, $s)) {
echo " => VALID\n";
} else {
echo " => INVALID\n";
}
}
输出:
first-second => VALID
first second => VALID
first'second => VALID
first & second => VALID
first&second => VALID
first(second) => VALID
first (second) => VALID
first-second-third => VALID
first second third => VALID
first second third(fourth) => VALID
first-second-third(fourth) => VALID
word Castaٌeda => VALID
Alfonso Lista (Potia) => VALID
Bacolod-Kalawi (Bacolod-Grande) => VALID
Balindong (Watu) => VALID
President Manuel A. Roxas => VALID
Enrique B. Magalona (Saravia) => VALID
Bacolod-Kalawi (Bacolod-Grande) => VALID
Datu Blah T. Sinsuat => VALID
Don Victoriano Chiongbian (Don Mariano Marcos) => VALID
Bulalacao (San Pedro) => VALID
Hinoba-an (Asia) => VALID