使用正则表达式提取大写单词和camecased单词

时间:2013-08-17 20:16:47

标签: php regex

我有以下字符串:

Beyonce Knowles is married to Jay-Z and KANYE WEST is awesome and San Antonio Texas is great but not as good as West Palm Beach, FL

我需要提取Beyonce KnowlesJay-ZKANYE WESTWest Palm Beach, FLSan Antonio Texas(已分隔)

我仍然是正则表达式的新手,但到目前为止我已经'/^[A-Z]+/

如何修复我的正则表达式以解释我想要获取的提取词?

由于

1 个答案:

答案 0 :(得分:1)

你可以试试这个:

/\p{Lu}+\p{L}*(?:[\s\p{P}]+\p{Lu}+\p{L}*)*/u

这将匹配一个或多个大写字母,后跟零个或多个小写字母,可能重复多次,由一个或多个空格或标点字符分隔。它利用了Unicode character classes,因此它可以处理其他语言的文本。

或者这只是连续匹配两个这样的模式:

/\p{Lu}+\p{L}*[\s\p{P}]+\p{Lu}+\p{L}*/u

例如:

$input = 'Beyonce Knowles is married to Jay-Z and KANYE WEST is awesome and San Antonio Texas is great but not as good as West Palm Beach, FL';
$pattern = '/\p{Lu}+\p{L}*(?:[\s\p{P}]+\p{Lu}+\p{L}*)*/u';
preg_match_all($pattern, $input, $output_array);

生成数组:

Array
(
    [0] => Array 
        (
            [0] => Beyonce Knowles
            [1] => Jay-Z
            [2] => KANYE WEST
            [3] => San Antonio Texas
            [4] => West Palm Beach, FL
        )
)