我有以下字符串:
Beyonce Knowles is married to Jay-Z and KANYE WEST is awesome and San Antonio Texas is great but not as good as West Palm Beach, FL
我需要提取Beyonce Knowles
,Jay-Z
,KANYE WEST
,West Palm Beach, FL
和San Antonio Texas
(已分隔)
我仍然是正则表达式的新手,但到目前为止我已经'/^[A-Z]+/
如何修复我的正则表达式以解释我想要获取的提取词?
由于
答案 0 :(得分:1)
你可以试试这个:
/\p{Lu}+\p{L}*(?:[\s\p{P}]+\p{Lu}+\p{L}*)*/u
这将匹配一个或多个大写字母,后跟零个或多个小写字母,可能重复多次,由一个或多个空格或标点字符分隔。它利用了Unicode character classes,因此它可以处理其他语言的文本。
或者这只是连续匹配两个这样的模式:
/\p{Lu}+\p{L}*[\s\p{P}]+\p{Lu}+\p{L}*/u
例如:
$input = 'Beyonce Knowles is married to Jay-Z and KANYE WEST is awesome and San Antonio Texas is great but not as good as West Palm Beach, FL';
$pattern = '/\p{Lu}+\p{L}*(?:[\s\p{P}]+\p{Lu}+\p{L}*)*/u';
preg_match_all($pattern, $input, $output_array);
生成数组:
Array
(
[0] => Array
(
[0] => Beyonce Knowles
[1] => Jay-Z
[2] => KANYE WEST
[3] => San Antonio Texas
[4] => West Palm Beach, FL
)
)