正则表达式匹配字符串中的字符,不包括HTML锚元素中的匹配项

时间:2010-10-19 08:55:38

标签: c# regex

考虑这一小段文字:

@"
I want to match  the word 'highlight' in a string. But I don't want to match
highlight when it is contained in an HTML anchor element. The expression
should not match highlight in the following text: <a href='#'>highlight</a>
"

这是输出应该是什么样的(匹配以粗体显示):

  

我想要匹配这个词   字符串中的“突出显示”。但是我   不想匹配   当它包含在HTML锚元素中时,突出显示。表达方式   不应该匹配突出显示   以下文字:   highlight

如何构建一个匹配所有 X 的表达式,不包括HTML锚元素中的匹配项?

1 个答案:

答案 0 :(得分:2)

我知道你要求RegEx,但我不会这样做。相反,这是使用Html Agility Pack的解决方案。

public static void Parse()
{
    string htmlFragment =
        @"
    I want to match  the word 'highlight' in a string. But I don't want to match
    highlight when it is contained in an HTML anchor element. The expression
    should not match highlight in the following text: <a href='#'>highlight</a> more
    ";
    HtmlDocument htmlDocument = new HtmlAgilityPack.HtmlDocument();
    htmlDocument.LoadHtml(htmlFragment);
    foreach (HtmlNode node in htmlDocument.DocumentNode.SelectNodes("//.").Where(FilterTextNodes()))
    {
        Console.WriteLine(node.OuterHtml);
    }
}

private static Func<HtmlNode, bool> FilterTextNodes()
{
    return node => node.NodeType == HtmlNodeType.Text && node.ParentNode != null && node.ParentNode.Name != "a" && node.OuterHtml.Contains("highlight");
}