Question

<td width="100%"><h1>Chicago, IL Weather</h1></td>

我想获取标签h1中的文本。为此，我想在C＃中使用正则表达式代码。谁能告诉我解决方案？

Answer 1

    System.Text.RegularExpressions.Regex bodyRegex = new System.Text.RegularExpressions.Regex(@"(<h1[^>]*>[\u0000-\uFFFF]+?</h1>)");
System.Text.RegularExpressions.Match bodyMatch = bodyRegex.Match(line);
        if (bodyMatch.Success)
          {
           FileContent = bodyMatch.Result("$0");
           FileContent = (FileContent.Replace(@"<h1>", "")).Replace(@"</h1>", "");
}

通过使用它，您可以找到第一个h1标签值

Answer 2

试一试

String h1Regex = "<h1[^>]*?>(?<TagText>.*?)</h1>";

MatchCollection mc = Regex.Matches(Data, h1Regex, RegexOptions.Singleline);

foreach (Match m in mc) {
    Console.Writeline (m.Groups["TagText"].Value);
}

Answer 3

为什么你想要Regex，我知道它是最快的方式，但它也有缺点：它弄乱了代码的可读性，

如果您的html文件发生了变化，那么编写新的正则表达式将会非常痛苦，

除非您必须这样做，否则请使用正则表达式并使用Html解析器（如上面提到的HTMLAgilityPack）。

正则表达式以读取HTML中的标签

3 个答案: