我为字符串中的各种名称制作了正则表达式:
$nameRegex = "/[A-Z-ÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽ']" .
"[.A-Z-ÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽa-z-àáâäãåąčćęèéêëėįìíîïłńòóôöõøùúûüųūÿýżźñçčšž']" .
'+\b(?: \b' .
"[A-Z-ÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽ']?[van|de]" .
"[A-Z-ÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽa-z-àáâäãåąčćęèéêëėįìíîïłńòóôöõøùúûüųūÿýżźñçčšž']+\b)*/u";
我正在尝试匹配所有非标准案例,例如:
John Doe waves | John Doe
Bakary N'Diaye says hello | Bakary N'Diaye
Iván Aguilar goes well | Iván Aguilar
Cisteró shot | Cisteró
Dan I Soylu shots | Dan I Soylu
Mike van der Hoorn with a cross | Mike van der Hoorn
M.J. Williams takes a shot | M.J. Williams
Donny van de Beek left foot | Donny van de Beek
Mike van der Hoorn hello | Mike van der Hoorn
Artak G. Grigoryan with through ball | Artak G. Grigoryan
Trent Alexander-Arnold after a break | Trent Alexander-Arnold
但是我的人在匹配这些名称方面做得很差-在这里您可以在操作https://regexr.com/4qgbt中看到它。
我该如何改善我的正则表达式,使其捕捉所有名称? (名字在句子的开头)
答案 0 :(得分:2)
也许,类似的表达,
^([\p{L} '.-]+?)(?:\s[a-z]+)*\h*$
可以(其中preg_match_all
)分为两个组。左边的起始组是一个捕获名称的组,右边的第二个组是一个不捕获组的名称,用于随后收集所有内容,我们对此并不感兴趣。
$re = '/^([\p{L} \'.-]+?)(?:\s[a-z]+)*\s*$/m';
$str = 'John Doe waves
Bakary N\'Diaye says hello
Iván Aguilar goes well
Cisteró shot
Dan I Soylu shots
Mike van der Hoorn with a cross
M.J. Williams takes a shot
Donny van de Beek left foot
Mike van der Hoorn hello
Artak G. Grigoryan with through ball
Trent Alexander-Arnold after a break
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
array(9) {
[0]=>
array(2) {
[0]=>
string(14) "John Doe waves"
[1]=>
string(8) "John Doe"
}
[1]=>
array(2) {
[0]=>
string(25) "Bakary N'Diaye says hello"
[1]=>
string(14) "Bakary N'Diaye"
}
[2]=>
array(2) {
[0]=>
string(17) "Dan I Soylu shots"
[1]=>
string(11) "Dan I Soylu"
}
[3]=>
array(2) {
[0]=>
string(31) "Mike van der Hoorn with a cross"
[1]=>
string(18) "Mike van der Hoorn"
}
[4]=>
array(2) {
[0]=>
string(26) "M.J. Williams takes a shot"
[1]=>
string(13) "M.J. Williams"
}
[5]=>
array(2) {
[0]=>
string(27) "Donny van de Beek left foot"
[1]=>
string(17) "Donny van de Beek"
}
[6]=>
array(2) {
[0]=>
string(24) "Mike van der Hoorn hello"
[1]=>
string(18) "Mike van der Hoorn"
}
[7]=>
array(2) {
[0]=>
string(36) "Artak G. Grigoryan with through ball"
[1]=>
string(18) "Artak G. Grigoryan"
}
[8]=>
array(2) {
[0]=>
string(37) "Trent Alexander-Arnold after a break
"
[1]=>
string(22) "Trent Alexander-Arnold"
}
}
在输入字符串的左侧,似乎没有问题,因为每一行都以名称开头。但是在右侧,行中有小写单词,中间有一个空格。在这里,我们将尝试编写一条语句来查找这些语句,甚至可以使用积极的前瞻性:
(?=(?:\s[a-z]+)*\h*$)
然后再说第二句话,
^[\p{L} '.-]+?
我们将收集名称,最终表达式将变为:
^[\p{L} '.-]+?(?=(?:\s[a-z]+)*\h*$)
$re = '/^[\p{L} \'.-]+?(?=(?:\s[a-z]+)*\h*$)/m';
$str = 'John Doe waves
Bakary N\'Diaye says hello
Iván Aguilar goes well
Cisteró shot
Dan I Soylu shots
Mike van der Hoorn with a cross
M.J. Williams takes a shot
Donny van de Beek left foot
Mike van der Hoorn hello
Artak G. Grigoryan with through ball
Trent Alexander-Arnold after a break
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
array(9) {
[0]=>
array(1) {
[0]=>
string(8) "John Doe"
}
[1]=>
array(1) {
[0]=>
string(14) "Bakary N'Diaye"
}
[2]=>
array(1) {
[0]=>
string(11) "Dan I Soylu"
}
[3]=>
array(1) {
[0]=>
string(18) "Mike van der Hoorn"
}
[4]=>
array(1) {
[0]=>
string(13) "M.J. Williams"
}
[5]=>
array(1) {
[0]=>
string(17) "Donny van de Beek"
}
[6]=>
array(1) {
[0]=>
string(18) "Mike van der Hoorn"
}
[7]=>
array(1) {
[0]=>
string(18) "Artak G. Grigoryan"
}
[8]=>
array(1) {
[0]=>
string(22) "Trent Alexander-Arnold"
}
}
我想,我们也可以看一下preg_replace
函数,完全忘记名称,而专注于匹配一行中名称的右侧边界,也许使用类似于以下内容的简单表达式:
(?:\s[a-z]+){0,}\h*$
或:
(?:\s*\b[a-z]+){0,}\h*$
$re = '/(?:\s[a-z]+){0,}\h*$/m';
$str = 'John Doe waves
Bakary N\'Diaye says hello
Iván Aguilar goes well
Cisteró shot
Dan I Soylu shots
Mike van der Hoorn with a cross
M.J. Williams takes a shot
Donny van de Beek left foot
Mike van der Hoorn hello
Artak G. Grigoryan with through ball
Trent Alexander-Arnold after a break ';
echo preg_replace($re, '', $str);
John Doe
Bakary N'Diaye
Iván Aguilar
Cisteró
Dan I Soylu
Mike van der Hoorn
M.J. Williams
Donny van de Beek
Mike van der Hoorn
Artak G. Grigoryan
Trent Alexander-Arnold
preg_replace
也许,这将是最简单,最快的方法。在这里,我们将获得带有贪婪表达式的一行中的最后一个大写字母,然后添加一个\S+
或\S*
:
^.*\p{Lu}\S+
或
^.*\p{Lu}\S*
或带有数字量符:
^.{0,50}\p{Lu}\S*
如果您想简化/更新/探索表达式,请在regex101.com的右上角进行解释。如果您有兴趣,可以观看匹配的步骤或在this debugger link中进行修改。调试器演示了a RegEx engine如何逐步使用一些示例输入字符串并执行匹配过程的过程。