Question

在C＃中，我的正则表达式有以下模式：

string pattern = "<div class=\"alt\" title=\"[\\w\\s]+\"><strong>([\\w\\s]+)</strong></div>";

我创建一个Match对象，如下所示：

status = Regex.Match(html, pattern);

但是如果我在状态上调用.groups（），我会得到空白文本，即使匹配也是如此。我是否正确地提取了该组？

编辑：这是一些HTML，

          <tr>
            <td>
                    <div class="alt" title="Released to Manufacturing">
                            <strong>Released to Manufacturing</strong>

Answer 1

string strRegex = @"<div class=""alt"" title=""[\w\s]+""><strong>([\w\s]+)</strong></div>";
RegexOptions myRegexOptions = RegexOptions.IgnoreCase | RegexOptions.Multiline;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = @"<div class=""alt"" title=""released""><strong>Released</strong></div>";

foreach (Match myMatch in myRegex.Matches(strTargetString))
{
    if (myMatch.Success)
    {
        var value = myMatch.Groups[1].Value;
    }
}

使用RegexHero验证

Answer 2

正则表达式不用于解析html ..

使用像Htmlagilitypack

这样的html解析器

   HtmlDocument doc = new HtmlDocument();
   doc.Load(yourStream);
   var altElementValues= doc.DocumentNode
                            .SelectNodes("//div[@class='alt']/strong")
                            .Select(x=>x.InnerText);

在Regex中获取组的价值

2 个答案: