Question

我在一些nemerle代码中有以下方法：

private static getLinks(text : string) : array[string] {
        def linkrx = Regex(@"<a\shref=['|\"](.*?)['|\"].*?>");
        def m = linkrx.Matches(text);
        mutable txmatches : array[string];
        for (mutable i = 0; i < m.Count; ++i) {
            txmatches[i] = m[i].Value;
        }
        txmatches
    }

问题是编译器出于某种原因试图解析regex语句中的括号并导致程序无法编译。如果我删除@，（我被告知放在那里）我在“\ s”上得到一个无效的转义字符错误

继承编译器输出：

NCrawler.n:23:21:23:22: ←[01;31merror←[0m: when parsing this `(' brace group
NCrawler.n:23:38:23:39: ←[01;31merror←[0m: unexpected closing bracket `]'
NCrawler.n:22:57:22:58: ←[01;31merror←[0m: when parsing this `{' brace group
NCrawler.n:23:38:23:39: ←[01;31merror←[0m: unexpected closing bracket `]'
NCrawler.n:8:1:8:2: ←[01;31merror←[0m: when parsing this `{' brace group
NCrawler.n:23:38:23:39: ←[01;31merror←[0m: unexpected closing bracket `]'
NCrawler.n:23:38:23:39: ←[01;31merror←[0m: unexpected closing bracket `]'

（第23行是带有正则表达式代码的行）

我该怎么办？

Answer 1

我不知道Nemerle，但似乎使用@禁用所有转义，包括"的转义。

尝试以下方法之一：

def linkrx = Regex("<a\\shref=['\"](.*?)['\"].*?>");

def linkrx = Regex(@"<a\shref=['""](.*?)['""].*?>");

def linkrx = Regex(@"<a\shref=['\x22](.*?)['\x22].*?>");

Answer 2

我不是Nemerle程序员，但我知道你总是使用XML解析器来处理基于XML的数据而不是regexp。

我猜有人为Nemerle创建了DOM或XPath库，因此您可以访问

// [@href]通过XPath或类似a.href.value通过DOM。

当前的正则表达式不喜欢例如

<a class="foo" href="something">bar</a>

我没有测试过，但它应该更像是

/<a\s.+?href=['|\"]([^'\">]+)['|\"].+?>/i

Answer 3

问题在于引号，而不是括号。在Nemerle中，就像在C＃中一样，你用另一个引号转义引号，而不是反斜杠。

@"<a\shref=['""](.*?)['""].*?>"

编辑：还要注意，方括号内不需要管道;内容被视为一组字符（或字符范围），暗示了OR。

正则表达式中的语法错误以匹配链接URL

3 个答案: