Question

我正在尝试编写一个包含尖括号的“http”网址的正则表达式，但以两个斜杠开头的行除外。我提出的最好的是：

s#^(?!//)(.*?)(http://[^\s]+)#$1<$2>#gm;

这适用于这两个：

输入：http://a.com

输出：<http://a.com>

输入：//http://a.com

输出：//http://a.com

然而，它在这里失败了：

输入：http://a.com http://b.com

实际输出：<http://a.com> http://b.com

所需输出：<http://a.com> <http://b.com>

为什么我的正则表达式不能保持匹配？我使用/ g错了吗？

Answer 1

你应该使用两个正则表达式;一个用于标识“注释掉”的行，另一个用于修改常规行中的http。

可能有一种非标准方法来组合两个正则表达式或替换所有多个（http ...）+匹配项，但我不会使用它们。

Answer 2

对于无限数量的表达式，您无法真正做到这一点。试试这个：

s#(http://[^\s]+)#<$1>#g unless m#^//#;

这将替换该行中的所有URL，但前提是该行的前两个字符不是“//”。当然，它有点复杂，但它有效（我认为）。

编辑：我的答案与aib相同，但我有代码。

Answer 3

重写一下......根据我的建议并使用空白修饰符，这样它实际上是可读的。：）

s{
    (?:^|\G)     # start of the last match, so you never backtrack and don't capture.
    (?!//)       # a section without //
    (.*?)        # followed by anything
    (
        http://  # with http://
        [^\s]+   # and non-spaces - you could also use \S
    )
 }
 {$1<$2>}xmg;

在perl中尝试这个，我们得到：

sub test {
    my ($str, $expect) = @_;
    my $mod = $str;
    $mod =~ s{
            (?:^|\G)       # start of the last match, so you never backtrack.
            (?!//)       # a section without //
            (.*?)        # followed by anything
            (
                http://  # with http://
                [^\s]+   # and non-spaces - you could also use \S
            )
          }
          {$1<$2>}xmg;
    print "Expecting '$expect' got '$mod' - ";
    print $mod eq $expect ? "passed\n" : "failed\n";
}

test("http://foo.com",    "<http://foo.com>");
test("// http://foo.com", "// http://foo.com");
test("foo\nhttp://a.com","foo\n<http://a.com>");

# output is 
# Expecting '<http://foo.com>' got '<http://foo.com>' - passed
# Expecting '// http://foo.com' got '// http://foo.com' - passed
# Expecting 'foo
# <http://a.com>' got 'foo
# <http://a.com>' - passed

编辑：改变几个：添加'm'修饰符以确保它与行的开头匹配，并将\ G更改为（^ | \ G）以确保它开始查看开头的也行。

如何编写在每一行上执行多次替换的正则表达式，当行以某个字符串开头时除外？

3 个答案: