大家好我正在使用HTml Agility和Openxml将我的html内容转换为word文件内容。
<div>
<div id="container">
<div>
<div>
<!--content starts here//-->
<form name="questions" method="post">
<img src="../../content/0/Static UPload/Divya_3LevelLeftMenu_Operating System v8.0 English/unit9/lesson27/../../images/less_title_27.jpg" width="750" height="75">
<div id="title">Exercise
<table border="0" cellspacing="20" cellpadding="0">
<tr>
<td><b> Student's Name: </b><br>
<input type="text" name="b1" size="45"></td>
<td><b>Class:</b><br>
<input type="text" name="b2" size="45"></td>
</tr>
</table>
<td width="176" align="left"> </td>
<tr><td width="779" align="left"> </td>
</tr>
<ol>
<li>Describe the purpose of Windows Update.
<p align="left"><textarea name="a1" rows="10" wrap="VIRTUAL" cols="55"></textarea></p>
</li>
</ol>
<ol start="2">
<li>Explain why using Windows Update is critical to maintaining an operating system.
<p align="left"><textarea name="a2" rows="10" wrap="VIRTUAL" cols="55"></textarea></p>
</li>
</ol>
<ol start="3">
<li>Summarize the process used to access and install Windows Updates.
<p align="left"><textarea name="a3" rows="10" wrap="VIRTUAL" cols="55"></textarea></p>
</li>
</ol>
<ol start="4">
<li>Compare and contrast using Windows Update and using a Windows Service Pack.
<p align="left"><textarea name="a4" rows="10" wrap="VIRTUAL" cols="55"></textarea></p>
</li>
</ol>
<center><p><b>Note: You must print your completed exercise
to submit to your instructor.</b><br>
<b class="style1"><u>Do Not</u></b> close this window without printing your exercise or your answers will be lost.<br><br>
<input onclick="reLoadMe(document.questions) " type="button" value="Print Preview">
</p>
</center>
</form>
<div align="center"><a href="#top"><img src="../../content/0/Static UPload/Divya_3LevelLeftMenu_Operating System v8.0 English/unit9/lesson27/../../images/back_to_top.jpg" alt="" width="40" height="21" border="0"></a>
</div></div></div></div></div></div>
这是我用来转换的html内容。 但是我在解析它时遇到以下错误。
at NotesFor.HtmlToOpenXml.TableContext.get_CurrentTable()
at NotesFor.HtmlToOpenXml.HtmlConverter.ProcessTableColumn(HtmlEnumerator en)
at NotesFor.HtmlToOpenXml.HtmlConverter.ProcessHtmlChunks(HtmlEnumerator en, String endTag)
at NotesFor.HtmlToOpenXml.HtmlConverter.Parse(String html)
at WebApplication3.WebForm3.Button1_Click(Object sender, EventArgs e) in C:\Users\USER\Documents\Visual Studio 2008\Projects\Piyush_training\WebApplication3\WebForm3.aspx.cs:line 102
我的代码如下。
using DocumentFormat.OpenXml.Drawing;
using NotesFor.HtmlToOpenXml;
using System.IO;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using wp = DocumentFormat.OpenXml.Drawing.Wordprocessing;
using DocumentFormat.OpenXml;
using HtmlAgilityPack;
using System.Text;
protected void Button1_Click(object sender, EventArgs e)
{
const string filename = "C:/Temp/test.docx";
Response.ContentEncoding = System.Text.Encoding.UTF7;
System.Text.StringBuilder SB = new System.Text.StringBuilder();
System.IO.StringWriter SW = new System.IO.StringWriter();
字符串pagecontent = 高于html内容; HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(pagecontent); if(doc == null); doc.OptionCheckSyntax = true; doc.OptionAutoCloseOnEnd = true; doc.OptionFixNestedTags = true; int errorCount = doc.ParseErrors.Count(); string output =“”;
doc.Save(SW);
System.Web.UI.HtmlTextWriter htmlTW = new System.Web.UI.HtmlTextWriter(SW);
strBody = "<html>" + "<body>" + "<div><b>" + htmlTW.InnerWriter.ToString() + "</b></div>" + "</body>" + "</html>";
string html = strBody;
try
{
using (MemoryStream generatedDocument = new MemoryStream())
{
using (WordprocessingDocument package = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = package.MainDocumentPart;
if (mainPart == null)
{
mainPart = package.AddMainDocumentPart();
new Document(new Body()).Save(mainPart);
}
HtmlConverter converter = new HtmlConverter(mainPart);
converter.ExcludeLinkAnchor = true;
converter.RefreshStyles();
converter.ImageProcessing = ImageProcessing.AutomaticDownload;
Body body = mainPart.Document.Body;
converter.ConsiderDivAsParagraph = false;
var paragraphs = converter.Parse(html);
for (int i = 0; i < paragraphs.Count; i++)
{
body.Append(paragraphs[i]);
}
mainPart.Document.Save();
}
File.WriteAllBytes(filename, generatedDocument.ToArray());
}
System.Diagnostics.Process.Start(filename);
}
catch (Exception ex)
{
Response.Write(ex.ToString());
}
}
答案 0 :(得分:2)
您可能希望尝试使用其他方法从HTML汇编word文档。根据您的要求,您可以采取以下几种方法之一:
altChunk,是Open XML文字处理的一个特殊功能 标记,使您可以嵌入整个Open XML文档或 html页面位于文档中的特定位置
Eric White有一些博客文章描述了这个过程,下面是他突出嵌入html文章的摘录:
使用Open XML SDK的V2:
using (WordprocessingDocument myDoc = WordprocessingDocument.Open("Test1.docx", true))
{
string altChunkId = "AltChunkId1";
MainDocumentPart mainPart = myDoc.MainDocumentPart;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
using (FileStream fileStream = File.Open("TestInsertedContent.docx", FileMode.Open))
chunk.FeedData(fileStream);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document
.Body
.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
mainPart.Document.Save();
}
整篇文章以及示例代码(位于底部):How to Use altChunk for Document Assembly
答案 1 :(得分:1)
使用此选项使用内容获取图像首先创建一个.docx文件,然后在下一个文件中添加Html文档。 要使用AltChunk mathod,您必须使用创建的文件来创建文件首先使用默认内容创建动态,因为altChunk不接受空白文件。
1.使用较小的内容创建.docx文件。 2.在默认内容后添加html内容。
try
{strBody = "<html>" + "<body>" + "<div> Word File </div>" + "</body>" + "</html>";
using (MemoryStream generatedDocument = new MemoryStream())
{
using (WordprocessingDocument package = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = package.MainDocumentPart;
if (mainPart == null)
{
mainPart = package.AddMainDocumentPart();
new Document(new Body()).Save(mainPart);
}
HtmlConverter converter = new HtmlConverter(mainPart);
converter.ExcludeLinkAnchor = true;
converter.RefreshStyles();
converter.ImageProcessing = ImageProcessing.AutomaticDownload;
converter.BaseImageUrl = new Uri(domainNameURL + "Images/");
Body body = mainPart.Document.Body;
converter.ConsiderDivAsParagraph = false;
var paragraphs = converter.Parse({strBody);
for (int i = 0; i < paragraphs.Count; i++)
{
body.Append(paragraphs[i]);
}
mainPart.Document.Save();
}
File.WriteAllBytes(filename, generatedDocument.ToArray());
}
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(filename, true))
{
XNamespace w =
"http://schemas.openxmlformats.org/wordprocessingml/2006/main";
XNamespace r =
"http://schemas.openxmlformats.org/officeDocument/2006/relationships";
string altChunkId = "AltChunkId1";
MainDocumentPart mainPart = myDoc.MainDocumentPart;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart("application/xhtml+xml", altChunkId);
using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
using (StreamWriter stringStream = new StreamWriter(chunkStream))
stringStream.Write(html);
XElement altChunk = new XElement(w + "altChunk",
new XAttribute(r + "id", altChunkId)
);
XDocument mainDocumentXDoc = GetXDocument(myDoc);
mainDocumentXDoc.Root
.Element(w + "body")
.Elements(w + "p")
.Last()
.AddAfterSelf(altChunk);
SaveXDocument(myDoc, mainDocumentXDoc);
}
System.Diagnostics.Process.Start(filename);
}
catch (Exception ex)
{
Response.Write(ex.ToString());
}
答案 2 :(得分:0)
在阅读之前的答案和此处的答案后,我使用此功能将庞大的HTML(带有嵌入式图像)转换为Word:https://stackoverflow.com/a/18152334/1863970
public static byte[] HtmlToWord(string html)
{
using (var generatedDocument = new MemoryStream())
{
using (var package = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = package.MainDocumentPart;
if (mainPart == null)
{
mainPart = package.AddMainDocumentPart();
new Document(new Body()).Save(mainPart);
}
HtmlConverter converter = new HtmlConverter(mainPart);
Body body = mainPart.Document.Body;
string altChunkId = "myId";
var memoryStream = new MemoryStream(Encoding.UTF8.GetBytes("<html><head></head><body>" + html + "</body></html>"));
// Create alternative format import part.
var formatImportPart = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Html, altChunkId);
// Feed HTML data into format import part (chunk).
formatImportPart.FeedData(memoryStream);
var altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document.Body.Append(altChunk);
mainPart.Document.Save();
}
return generatedDocument.ToArray();
}
}