使用HTMLAgilityPack在特定位置注入HTML

时间:2018-10-03 02:11:30

标签: c# html html-agility-pack

有人要求我将一堆HTML注入HTML文档中的特定点,并且一直在考虑使用HTMLAgilityPack这样做。 据我所知,推荐这样做的方法是使用节点解析并替换/删除相关节点。

到目前为止,这是我的代码

//Load original HTML
var originalHtml = new HtmlDocument();
originalHtml.Load(@"C:\Temp\test.html");

//Load inject HTML
var inject = new HtmlDocument();
inject.Load(@"C:\Temp\Temp\inject.html");
var injectNode = HtmlNode.CreateNode(inject.Text);

//Get all HTML nodes to inject/delete
var nodesToDelete = originalHtml.DocumentNode.SelectNodes("//p[@style='page-break-after:avoid']");
var countToDelete = nodesToDelete.Count();

//loop through stuff to remove
int count = 0;
foreach (var nodeToDelete in nodesToDelete)
{
    count++;
    if (count == 1)
    {
        //replace with inject HTML
        nodeToDelete.ParentNode.ReplaceChild(injectNode, nodeToDelete);
    }
    else if (count <= countToDelete)
    {
        //remove, as HTML already injected
        nodeToDelete.ParentNode.RemoveChild(nodeToDelete);
    }
}

我发现的是,原始HTML没有正确更新,似乎只注入了父级节点,这很简单,没有子级节点。

有帮助吗?

谢谢

帕特里克。

1 个答案:

答案 0 :(得分:0)

嗯,我无法弄清楚如何使用HTMLAgilityPack做到这一点,可能更多是由于我对节点的了解比其他任何事情都重要,但是我确实找到了使用AngleSharp的简单解决方案。

//Load original HTML into document
var parser = new HtmlParser();
var htmlDocument = parser.Parse(File.ReadAllText(@"C:\Temp\test.html"));

//Load inject HTML as raw text
var injectHtml = File.ReadAllText(@"C:\Temp\inject.html")

//Get all HTML elements to inject/delete
var elements = htmlDocument.All.Where(e => e.Attributes.Any(a => a.Name == "style" && a.Value == "page-break-after:avoid"));

//loop through stuff to remove
int count = 1;
foreach (var element  in elements)
{
    if (count == 1)
    {
        //replace with inject HTML
        element.OuterHtml = injectHtml;
    }
    else
    {
        //remove, as HTML already injected
        element.Remove();
    }
    count++;
}


//Re-write updated file
File.WriteAllText(@"C:\Temp\test_updated.html", string.Format("{0}{1}{2}{3}","<html>",htmlDocument.Head.OuterHtml,htmlDocument.Body.OuterHtml,"</html>"));