Question

我有以下字符串：

Beyonce Knowles is married to Jay-Z and KANYE WEST is awesome and San Antonio Texas is great but not as good as West Palm Beach, FL

我需要提取Beyonce Knowles，Jay-Z，KANYE WEST，West Palm Beach, FL和San Antonio Texas（已分隔）

我仍然是正则表达式的新手，但到目前为止我已经'/^[A-Z]+/

了

如何修复我的正则表达式以解释我想要获取的提取词？

由于

Answer 1

你可以试试这个：

/\p{Lu}+\p{L}*(?:[\s\p{P}]+\p{Lu}+\p{L}*)*/u

这将匹配一个或多个大写字母，后跟零个或多个小写字母，可能重复多次，由一个或多个空格或标点字符分隔。它利用了Unicode character classes，因此它可以处理其他语言的文本。

或者这只是连续匹配两个这样的模式：

/\p{Lu}+\p{L}*[\s\p{P}]+\p{Lu}+\p{L}*/u

例如：

$input = 'Beyonce Knowles is married to Jay-Z and KANYE WEST is awesome and San Antonio Texas is great but not as good as West Palm Beach, FL';
$pattern = '/\p{Lu}+\p{L}*(?:[\s\p{P}]+\p{Lu}+\p{L}*)*/u';
preg_match_all($pattern, $input, $output_array);

生成数组：

Array
(
    [0] => Array 
        (
            [0] => Beyonce Knowles
            [1] => Jay-Z
            [2] => KANYE WEST
            [3] => San Antonio Texas
            [4] => West Palm Beach, FL
        )
)

使用正则表达式提取大写单词和camecased单词

1 个答案: