有人要求我将一堆HTML注入HTML文档中的特定点,并且一直在考虑使用HTMLAgilityPack这样做。 据我所知,推荐这样做的方法是使用节点解析并替换/删除相关节点。
到目前为止,这是我的代码
//Load original HTML
var originalHtml = new HtmlDocument();
originalHtml.Load(@"C:\Temp\test.html");
//Load inject HTML
var inject = new HtmlDocument();
inject.Load(@"C:\Temp\Temp\inject.html");
var injectNode = HtmlNode.CreateNode(inject.Text);
//Get all HTML nodes to inject/delete
var nodesToDelete = originalHtml.DocumentNode.SelectNodes("//p[@style='page-break-after:avoid']");
var countToDelete = nodesToDelete.Count();
//loop through stuff to remove
int count = 0;
foreach (var nodeToDelete in nodesToDelete)
{
count++;
if (count == 1)
{
//replace with inject HTML
nodeToDelete.ParentNode.ReplaceChild(injectNode, nodeToDelete);
}
else if (count <= countToDelete)
{
//remove, as HTML already injected
nodeToDelete.ParentNode.RemoveChild(nodeToDelete);
}
}
我发现的是,原始HTML没有正确更新,似乎只注入了父级节点,这很简单,没有子级节点。
有帮助吗?
谢谢
帕特里克。
答案 0 :(得分:0)
嗯,我无法弄清楚如何使用HTMLAgilityPack做到这一点,可能更多是由于我对节点的了解比其他任何事情都重要,但是我确实找到了使用AngleSharp的简单解决方案。
//Load original HTML into document
var parser = new HtmlParser();
var htmlDocument = parser.Parse(File.ReadAllText(@"C:\Temp\test.html"));
//Load inject HTML as raw text
var injectHtml = File.ReadAllText(@"C:\Temp\inject.html")
//Get all HTML elements to inject/delete
var elements = htmlDocument.All.Where(e => e.Attributes.Any(a => a.Name == "style" && a.Value == "page-break-after:avoid"));
//loop through stuff to remove
int count = 1;
foreach (var element in elements)
{
if (count == 1)
{
//replace with inject HTML
element.OuterHtml = injectHtml;
}
else
{
//remove, as HTML already injected
element.Remove();
}
count++;
}
//Re-write updated file
File.WriteAllText(@"C:\Temp\test_updated.html", string.Format("{0}{1}{2}{3}","<html>",htmlDocument.Head.OuterHtml,htmlDocument.Body.OuterHtml,"</html>"));