Question

我正在尝试获取字符串的一部分。

使用此表达式：

@"<a .*href=""(?<Url>(.*))(?="")"""

要匹配的示例数据：

var input = @"<html lang=""en"">
    <head>
        <link href=""http://www.somepage.com/c/main.css"" rel=""stylesheet"" type=""text/css"" />

        <link rel=""canonical"" href=""http://www.somepage.com"" />
        <script src=""http://www.somepage.com/professional/bower_components/modernizr/modernizr.js"" type=""text/javascript""></script>
    </head>
        <body>
            <header>
                <div>
                    <div>
                        <a aria-haspopup=""true"" href=""http://www.somepage.com/someotherpage""><img src=""http://www.somepage.com/i/sprite/logo.png"" alt=page"" /></a>
                    </div>
                </div>
            </header>
        </body>
    </html>"

现在我能够得到这个值：

http://www.somepage.com/someotherpage\"><img src=""http://www.somepage.com/i/sprite/logo.png"" alt=page"" /></a>

使用此代码：

var regexPattern = new Regex(PATTERN, RegexOptions.IgnoreCase);
var matches = regexPattern.Matches(httpResult);
foreach (Match match in matches)
{
    // here I'm getting this value 
    var extractedValue = match.Groups["Url"].Value; // it's value is http://www.somepage.com/someotherpage\"><img src=""http://www.somepage.com/i/sprite/logo.png"" alt=page"" /></a>
}

我想在match.Groups["Url"].Value下获得的内容很简单http://www.somepage.com/someotherpage，href attribute之后没有任何内容。

是否可以在Substring上不使用extractedValue的情况下仅获得该部分匹配？

Answer 1

你快到了。正则表达式中只有一个小的变化，不允许匹配集中的引号。

<a .*href=""(?<Url>([^"]*))(?="")""
                  //^^^^ This is what i changed.

Answer 2

也许这会奏效。不幸的是我现在没有时间测试它：

"<a[^>]*href=\"(?<Url>([^\"]+))\"[^>]*>"

Answer 3

以下内容应该有效：

<a .*href=""(?<Url>(.+?))(?="")""

问题在于（。*）*是贪婪的。 +？ "Matches the previous element one or more times, but as few times as possible"所以它将停在第一个引号。有关正则表达式中贪婪的更多信息，可以查看Regex Tutorial - Repetition with Star and Plus

Answer 4

使用此模式，更不用说在不使用.* meme（更快处理）时回溯。此模式使用\x22作为"以便更轻松地操作模式，因为它避免了C＃文字混淆问题。

Regex.Matches(input, @"<a.+href=\x22(?<Url>[^\x22]+).+/a>")
     .OfType<Match>()
     .Select (mt => mt.Groups["Url"].Value);
     // Result = http://www.somepage.com/someotherpage

使用正则表达式

4 个答案: