我尝试使用HtmlAgilityPack从头开始创建XHTMl文件。根据{{3}}中提供的建议,我尝试向其添加文档类型:
private static HtmlDocument createEmptyDoc()
{
HtmlDocument titlePage = new HtmlDocument();
titlePage.OptionOutputAsXml = true;
titlePage.OptionCheckSyntax = true;
titlePage.AddDoctype();
var html = titlePage.CreateElement("html");
titlePage.DocumentNode.AppendChild(html);
return titlePage;
}
public static class HtmlDocumentExtensions
{
public static void AddDoctype(this HtmlDocument doc)
{
var doctype = doc.DocumentNode.PrependChild(doc.CreateComment("<!doctype html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">"));
}
}
但是,当我将此文档写入文件时,它看起来像这样:
<?xml version="1.0" encoding="iso-8859-1"?>
<!--type html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.d-->
<html />
doctype确实被视为注释,而某些字符被短划线取代。如何解决此问题并将文档类型原样写入文件?
编辑:为HtmlDocument添加了自定义扩展
答案 0 :(得分:1)
static void Main(string[] args)
{
string html = @"
<html>
<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
<table>
<tr>
<td>A!!</td>
<td>te2</td>
<td>2!!</td>
<td>te43</td>
<td></td>
<td> !!</td>
<td>.!!</td>
<td>te53</td>
<td>te2</td>
<td>texx</td>
</tr>
</table>
<h4 class=""nikstyle_title""><a rel=""nofollow"" target=""_blank"" href=""http://www.niksalehi.com/ccount/click.php?ref=ZDNkM0xuQmxjbk5wWVc1MkxtTnZiUT09&id=117""><span class=""text-matn-title-bold-black"">my text</span></a></h4>
</body>
</html>";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
var doctype = doc.DocumentNode.SelectSingleNode("/comment()[starts-with(.,'<!DOCTYPE')]");
if (doctype == null)
doctype = doc.DocumentNode.PrependChild(doc.CreateComment());
doctype.InnerHtml = "<!DOCTYPE html>";
string html2 = doc.DocumentNode.InnerHtml;
}
其他问题中的代码为您提供了完成此操作的方法。这是完整的例子。
答案 1 :(得分:1)
试试这个:
using HtmlAgilityPack;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
HtmlDocument doc = new HtmlDocument();
HtmlNode docNode = HtmlNode.CreateNode("<html><head></head><body></body></html>");
HtmlNode rootNode = HtmlNode.CreateNode("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">");
doc.DocumentNode.AppendChild(rootNode);
doc.DocumentNode.AppendChild(docNode);
doc.Save("test.html");
}
}
}