cf / Finding HTML strings in document和类似的问题。
我已经看到了使用HtmlAgilityPack
来解析查找特定标记的字符串的示例,但是如果我想确保输入字符串仅包含列表List<string> AllowedTags
中的字符串呢?
换句话说,我如何迭代doc.DocumentNode.Descendants
来识别标签名称并检查它是否在列表中?
答案 0 :(得分:3)
var allowedTags = new List<string> { "html", "head", "body", "div" };
bool containsOnlyAllowedTags =
doc.DocumentNode
.Descendants()
.Where(n => n.NodeType == HtmlNodeType.Element)
.All(n => allowedTags.Contains(n.Name));
答案 1 :(得分:2)
List<string> AllowedTags = new List<string>() { "br", "a" };
HtmlDocument goodDoc = new HtmlDocument();
goodDoc.LoadHtml("<a href='asdf'>asdf</a><br /><a href='qwer'>qwer</a>");
bool containsBadTags = goodDoc.DocumentNode .Descendants()
.Where(node => node.NodeType == HtmlNodeType.Element)
.Select(node => node.Name)
.Except(AllowedTags)
.Any();
HtmlDocument badDoc = new HtmlDocument();
badDoc.LoadHtml("<a href='asdf'><b>asdf</b></a><br /><a href='qwer'>qwer</a>");
containsBadTags = badDoc.DocumentNode .Descendants()
.Where(node => node.NodeType == HtmlNodeType.Element)
.Select(node => node.Name)
.Except(AllowedTags)
.Any();