Question

我正在尝试使用Microsoft的OpenXML 2.5库来创建OpenXML文档。一切都很好，直到我尝试在我的文档中插入HTML字符串。我已经浏览了网页，这是我到目前为止所提出的内容（剪切到我遇到问题的部分）：

Paragraph paragraph = new Paragraph();
Run run = new Run();

string altChunkId = "id1";
AlternativeFormatImportPart chunk =
       document.MainDocumentPart.AddAlternativeFormatImportPart(
           AlternativeFormatImportPartType.Html, altChunkId);
chunk.FeedData(new MemoryStream(Encoding.UTF8.GetBytes(ioi.Text)));
AltChunk altChunk = new AltChunk { Id = altChunkId };

run.AppendChild(new Break());

paragraph.AppendChild(run);
body.AppendChild(paragraph);

显然，我在这个例子中实际上没有添加altChunk，但是我尝试将它附加到任何地方 - 运行，段落，正文等。在任何情况下，我都无法在Word中打开docx文件2010。

这让我有点疯狂，因为它似乎应该是直截了当的（我承认我并没有完全理解AltChunk“的事情”）。非常感谢任何帮助。

附注：我发现有一件事很有趣，我不知道它是否真的是一个问题，this response表示AltChunk在使用MemoryStream工作时会破坏文件。任何人都能证实这是/不是真的吗？

Answer 1

我可以通过使用重现错误“......内容存在问题” 不完整的HTML文档作为替代格式导入部分的内容。例如，如果您使用以下HTML代码段<h1>HELLO</h1> MS Word无法打开文档。

下面的代码显示了如何将AlternativeFormatImportPart添加到word文档中。（我用MS Word 2013测试了代码。）

using (WordprocessingDocument doc = WordprocessingDocument.Open(@"test.docx", true))
{
  string altChunkId = "myId";
  MainDocumentPart mainDocPart = doc.MainDocumentPart;

  var run = new Run(new Text("test"));
  var p = new Paragraph(new ParagraphProperties(
       new Justification() { Val = JustificationValues.Center }),
                     run);

  var body = mainDocPart.Document.Body;
  body.Append(p);        

  MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<html><head></head><body><h1>HELLO</h1></body></html>"));

  // Uncomment the following line to create an invalid word document.
  // MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<h1>HELLO</h1>"));

  // Create alternative format import part.
  AlternativeFormatImportPart formatImportPart =
     mainDocPart.AddAlternativeFormatImportPart(
        AlternativeFormatImportPartType.Html, altChunkId);
  //ms.Seek(0, SeekOrigin.Begin);

  // Feed HTML data into format import part (chunk).
  formatImportPart.FeedData(ms);
  AltChunk altChunk = new AltChunk();
  altChunk.Id = altChunkId;

  mainDocPart.Document.Body.Append(altChunk);
}

根据Office OpenXML规范有效的父元素 w:altChunk元素为body, comment, docPartBody, endnote, footnote, ftr, hdr and tc。所以，我已将w:altChunk添加到body元素。

有关w:altChunk元素的详细信息，请参阅此MSDN链接。

修改

正如@ user2945722所指出的，为了确保OpenXml库correctlty将字节数组解释为UTF-8，您应该添加UTF-8前导码。这可以这样做：

MemoryStream ms = new MemoryStream(new UTF8Encoding(true).GetPreamble().Concat(Encoding.UTF8.GetBytes(htmlEncodedString)).ToArray()

这样可以防止你的é被rendered，your，ä's等等。

Answer 2

这里有同样的问题，但原因完全不同。如果接受的解决方案没有帮助，那就值得一试。保存后尝试关闭文件。就我而言，它恰好是腐败和干净的docx文件之间的区别。奇怪的是，大多数其他操作只使用Save（）和程序退出。

String cid = "chunkid";
WordprocessingDocument document = WordprocessingDocument.Open("somefile.docx", true);
Body body = document.MainDocumentPart.Document.Body;
MemoryStream ms = new MemoryStream(System.Text.Encoding.UTF8.GetBytes("<html><head></head><body>hi</body></html>"));
AlternativeFormatImportPart formatImportPart = document.MainDocumentPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Html, cid);
formatImportPart.FeedData(ms);
AltChunk altChunk = new AltChunk();
altChunk.Id = cid;
document.MainDocumentPart.Document.Body.Append(altChunk);
document.MainDocumentPart.Document.Save();
// here's the magic!
document.Close();

将HTML字符串添加到OpenXML（* .docx）文档

2 个答案: