Question

我整个早上一直试图让这个正则表达式正确，我已经撞墙了。在以下字符串中，我不想匹配.com/<first_word>之后的每个正斜杠，除了之后的任何/ 。

$string = "http://example.com/foo/12/jacket Input/Output"; match------------------------^--^

斜杠之间的单词长度无关紧要。

正则表达式：(?<=.com\/\w)(\/)结果：

$string = "http://example.com/foo/12/jacket Input/Output"; // no match $string = "http://example.com/f/12/jacket Input/Output"; matches--------------------^

正则表达式：(?<=\/\w)(\/)结果：

$string = "http://example.com/foo/20/jacket Input/O/utput"; // misses the /'s in the URL matches----------------------------------------^ $string = "http://example.com/f/2/jacket Input/O/utput"; // don't want the match between Input/Output matches--------------------^-^--------------^

因为lookbehind可以没有修饰符并且需要是一个零长度断言，我想知道我是否刚刚错误的路径并且应该寻找另一个正则表达式组合。

正面看待正确的方法吗？或者我错过了大量咖啡以外的其他东西？

注意：用 PHP 标记，因为正则表达式应该在 preg_*函数的任何中工作。

Answer 1

此处\K与\G一起使用。抓住groups。

^.*?\.com\/\w+\K|\G(\/)\w+\K

参见演示。

https://regex101.com/r/aT3kG2/6

$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m"; 
$str = "http://example.com/foo/12/jacket Input/Output"; 

preg_match_all($re, $str, $matches);

Replace

$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m"; 
$str = "http://example.com/foo/12/jacket Input/Output"; 
$subst = "|"; 

$result = preg_replace($re, $subst, $str);

Answer 2

如果你想使用preg_replace，那么这个正则表达式应该可以工作：

$re = '~(?:^.*?\.com/|(?<!^)\G)[^/\h]*\K/~';
$str = "http://example.com/foo/12/jacket Input/Output";
echo preg_replace($re, '|', $str);
//=> http://example.com/foo|12|jacket Input/Output

因此，在/开始后出现的第一个|后，将/替换为.com。

需要使用否定Lookbehind (?<!^)来避免替换字符串而不启动.com之类的/foo/bar/baz/abcd。

RegEx Demo

Answer 3

另一个基于\G和\K的想法。

$re = '~(?:^\S+\.com/\w|\G(?!^))\w*+\K/~';

(: non capture group设置入口点 ^\S+\.com/\w或glue matches \G(?!^)。
\w*+\K/ possessively匹配任意数量的字符直到斜杠。 \K resets匹配。

See demo at regex101

在正面观察后匹配所有特定角色

3 个答案: