我一直在使用HTMLAgility,但无济于事,HTML的结构无法正确显示。
这是我试图阅读的HTML(简化)
<table>...</table>
正如您所看到的那样<html><head></head><body></body></html>
这是我到目前为止的代码:
HtmlDocument html = new HtmlDocument();
html.LoadHtml(HttpUtility.HtmlDecode(str_html));
//check if <html> exists. If not create <html>
var htmlNode = html.DocumentNode.SelectSingleNode("//html");
if (htmlNode == null)
{
htmlNode = html.CreateElement("html");
var htmlCollection = html.DocumentNode.ChildNodes;
htmlNode.AppendChildren(htmlCollection);
html.DocumentNode.RemoveAllChildren();
html.DocumentNode.PrependChild(htmlNode);
}
//check if <head> exists, if not create <head>
HtmlNode head = html.DocumentNode.SelectSingleNode("//head");
HtmlNode cssLink = html.DocumentNode.SelectSingleNode("//link[contains(@href, '/assets/global/css/reset.css')]");
if (head != null)
{
//if <link> does not exist, create <link> to reset.css
if (cssLink == null)
{
cssLink = html.CreateElement("link");
cssLink.SetAttributeValue("rel", "stylesheet");
cssLink.SetAttributeValue("href", Url.Content("/assets/global/css/reset.css"));
head.AppendChild(cssLink);
}
}
else
{
//
var htmlNode2 = html.DocumentNode.SelectSingleNode("//html");
head = html.CreateElement("head");
var htmlCollection = html.DocumentNode.ChildNodes;
html.DocumentNode.InnerHtml(head);
if (cssLink == null)
{
cssLink = html.CreateElement("link");
cssLink.SetAttributeValue("rel", "stylesheet");
cssLink.SetAttributeValue("href", Url.Content("/assets/global/css/reset.css"));
head.AppendChild(cssLink);
}
}
//check if <body> exists, if yes, add style='margin:0; padding:0'
HtmlNode htmlBody = html.DocumentNode.SelectSingleNode("//body");
if (htmlBody != null)
htmlBody.SetAttributeValue("style", "margin: 0; padding: 0;");
//remove <script> and <iframe> references
html.DocumentNode.Descendants()
.Where(n => n.Name == "script" || n.Name == "iframe")
.ToList()
.ForEach(n => n.Remove());
str_html = html.DocumentNode.OuterHtml;
这是输出:
<head><link rel="stylesheet" href="/assets/global/css/reset.css"></head><html><table>...</table</html>
为什么HEAD显示在<html>
前面。我也尝试了.appendchild。但它产生了以下结果:
<html><table>asome stuff </table></html><head></html><link rel="stylesheet" href="/assets/global/css/reset.css">
我需要代码显示为:
<html><head>some stuff</head><body></body></html>
感谢任何帮助。
感谢。
答案 0 :(得分:1)
例如,您可以尝试将<head>
作为<html>
的孩子前置(为清晰起见,删除了非相关代码):
var str_html = "<table>...</table>";
.....
if (head != null)
{
.....
}
else
{
head = html.CreateElement("head");
var htmlCollection = html.DocumentNode.ChildNodes;
htmlNode.PrependChild(head); //I only added this line to your existing code
if (cssLink == null)
{
cssLink = html.CreateElement("link");
cssLink.SetAttributeValue("rel", "stylesheet");
cssLink.SetAttributeValue("href", Url.Content("/assets/global/css/reset.css"));
head.AppendChild(cssLink);
}
}
输出顺序正确:
<html><head><link rel="stylesheet" href="/assets/global/css/reset.css"></head><table>...</table></html>