Question

我在InfoPath表单中设置了Rich Text Box，我的程序通过Infopath XML进行解析，如下所示：

XPathNavigator formNameNode = root.SelectSingleNode("/my:myFields/my:Responses/my:Q1", nsMgr);
string response1 = formNameNode.InnerXml;

然后使用以下代码打开word文档并获取名为response1的纯文本内容控件：

    using (WordprocessingDocument myDoc =
WordprocessingDocument.Open(ms, true))
    {
        MainDocumentPart mainPart = myDoc.MainDocumentPart;

    List<OpenXmlElement> sdtList = InfoPathToWord.GetContentControl(mainPart.Document, "response1");
            InfoPathToWord.AddRichText(0, response1, ref mainPart, ref sdtList);
}

然后代码调用InfoPathToWord.AddRichText，如下所示：

public static void AddRichText(int id, string rtfValue,
          ref MainDocumentPart mainPart, ref List<OpenXmlElement> sdtList)
        {
            if (sdtList.Count != 0)
            {
                id++;
                string altChunkId = "AltChunkId" + id;
                AlternativeFormatImportPart chunk =
                  mainPart.AddAlternativeFormatImportPart(
                  AlternativeFormatImportPartType.Xhtml, altChunkId);

                using (MemoryStream ms = new MemoryStream(System.Text.Encoding.Default.GetBytes(rtfValue)))
                {
                    chunk.FeedData(ms);
                    ms.Close();
                }

                AltChunk altChunk = new AltChunk();
                altChunk.Id = altChunkId;

                InfoPathToWord.ReplaceContentControl(sdtList, altChunk);
            }
        }

最后altChunk取代了“response1”

    public static void ReplaceContentControl(
      List<OpenXmlElement> sdtList, OpenXmlElement element)
    {
        if (sdtList.Count != 0)
        {
            foreach (OpenXmlElement sdt in sdtList)
            {
                OpenXmlElement parent = sdt.Parent;
                parent.InsertAfter(element, sdt);
                sdt.Remove();
            }
        }
    }

问题是它取代了文本，但格式不正确并显示“？”输出文本中的字符。不确定它是否是由于编码引起的，我也尝试了System.Text.Encoding.UTF8.GetBytes(rtfValue), System.Text.Encoding.ASCII.GetBytes(rtfValue)但这似乎没有任何帮助。

请有人告诉我我做错了什么。

提前致谢。

MAVE

Answer 1

我在保存之前使用regx来清理字符串。

html = Regex.Replace（html，“/ [\ x00- \ x08 \ x0B \ x0C \ x0E- \ x1F \ x80- \ x9F] / u”，“”）'允许标签和其他可打印的字符

Dim ms As New MemoryStream（System.Text.Encoding.UTF8.GetBytes（html）） '创建替代格式导入部分。 Dim formatImportPart As AlternativeFormatImportPart = mainDocPart.AddAlternativeFormatImportPart（“application / xhtml + xml”，altChunkId）

Regex to remove all special characters from string?

UPDATE ...经过严格的测试后，我发现在docx中InfoPath RTF存在太多字符编码问题。

OpenXML - Infopath RichText Box to Word Document提供格式错误

1 个答案: