HtmlAgility Html节点未按正确顺序显示

时间:2014-08-06 06:37:22

标签: c# asp.net-mvc html-agility-pack

我一直在使用HTMLAgility,但无济于事,HTML的结构无法正确显示。

这是我试图阅读的HTML(简化)

<table>...</table>

正如您所看到的那样<html><head></head><body></body></html>

这是我到目前为止的代码:

HtmlDocument html = new HtmlDocument();
html.LoadHtml(HttpUtility.HtmlDecode(str_html));

//check if <html> exists. If not create <html>
var htmlNode = html.DocumentNode.SelectSingleNode("//html");
if (htmlNode == null)
{
    htmlNode = html.CreateElement("html");
    var htmlCollection = html.DocumentNode.ChildNodes;
    htmlNode.AppendChildren(htmlCollection);
    html.DocumentNode.RemoveAllChildren();
    html.DocumentNode.PrependChild(htmlNode);
}

//check if <head> exists, if not create <head>
HtmlNode head = html.DocumentNode.SelectSingleNode("//head");
HtmlNode cssLink = html.DocumentNode.SelectSingleNode("//link[contains(@href, '/assets/global/css/reset.css')]");
if (head != null)
{
    //if <link> does not exist, create <link> to reset.css
    if (cssLink == null)
    {
        cssLink = html.CreateElement("link");
        cssLink.SetAttributeValue("rel", "stylesheet");
        cssLink.SetAttributeValue("href", Url.Content("/assets/global/css/reset.css"));
        head.AppendChild(cssLink);
    }
}
else
{
    //
    var htmlNode2 = html.DocumentNode.SelectSingleNode("//html");
    head = html.CreateElement("head");
    var htmlCollection = html.DocumentNode.ChildNodes;
    html.DocumentNode.InnerHtml(head);

    if (cssLink == null)
    {
        cssLink = html.CreateElement("link");
        cssLink.SetAttributeValue("rel", "stylesheet");
        cssLink.SetAttributeValue("href", Url.Content("/assets/global/css/reset.css"));
        head.AppendChild(cssLink);
    }
}

//check if <body> exists, if yes, add style='margin:0; padding:0'
HtmlNode htmlBody = html.DocumentNode.SelectSingleNode("//body");
if (htmlBody != null)
    htmlBody.SetAttributeValue("style", "margin: 0; padding: 0;");

//remove <script> and <iframe> references
html.DocumentNode.Descendants()
                .Where(n => n.Name == "script" || n.Name == "iframe")
                .ToList()
                .ForEach(n => n.Remove());

str_html = html.DocumentNode.OuterHtml;

这是输出:

<head><link rel="stylesheet" href="/assets/global/css/reset.css"></head><html><table>...</table</html>

为什么HEAD显示在<html>前面。我也尝试了.appendchild。但它产生了以下结果:

<html><table>asome stuff </table></html><head></html><link rel="stylesheet" href="/assets/global/css/reset.css">

我需要代码显示为:

<html><head>some stuff</head><body></body></html>

感谢任何帮助。

感谢。

1 个答案:

答案 0 :(得分:1)

例如,您可以尝试将<head>作为<html>的孩子前置(为清晰起见,删除了非相关代码):

var str_html = "<table>...</table>";
.....
if (head != null)
{
    .....
}
else
{
    head = html.CreateElement("head");
    var htmlCollection = html.DocumentNode.ChildNodes;
    htmlNode.PrependChild(head); //I only added this line to your existing code

    if (cssLink == null)
    {
        cssLink = html.CreateElement("link");
        cssLink.SetAttributeValue("rel", "stylesheet");
        cssLink.SetAttributeValue("href", Url.Content("/assets/global/css/reset.css"));
        head.AppendChild(cssLink);
    }
}

输出顺序正确:

<html><head><link rel="stylesheet" href="/assets/global/css/reset.css"></head><table>...</table></html>