docx牢不可破的话

时间:2013-04-03 15:02:32

标签: openxml

我正在尝试替换docx文件中的单词,如here所述:

public static void SearchAndReplace(string document)
{
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
    {
        string docText = null;
        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            docText = sr.ReadToEnd();
        }

        Regex regexText = new Regex("Hello world!");
        docText = regexText.Replace(docText, "Hi Everyone!");

        using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
        {
            sw.Write(docText);
        }
    }
}

这项工作正常,但有时对于文档中的SomeTest,您会得到类似的内容:

    <w:t>
        Some
    </w:t>
</w:r>

<w:r w:rsidR="009E5AFA">
    <w:rPr>
        <w:b/>
        <w:color w:val="365F91"/>
        <w:sz w:val="22"/>
    </w:rPr>
    <w:t>
        Test
    </w:t>
</w:r>

当然替换失败了。也许有一种解决方法可以在docx中使一些单词牢不可破?或许我正在替换错误?

1 个答案:

答案 0 :(得分:3)

解决此问题的一种方法是在进行转换之前规范化文档的xml。您可以使用OpenXml Powertools来执行此操作。

规范化xml的示例代码

 using (WordprocessingDocument doc =
            WordprocessingDocument.Open("Test.docx", true))
        {
            SimplifyMarkupSettings settings = new SimplifyMarkupSettings
            {
                NormalizeXml = true, // Merges Run's in a paragraph with similar formatting
                // Additional settings if required
                AcceptRevisions = true,
                RemoveBookmarks = true,
                RemoveComments = true,
                RemoveGoBackBookmark = true,
                RemoveWebHidden = true,
                RemoveContentControls = true,
                RemoveEndAndFootNotes = true,
                RemoveFieldCodes = true,
                RemoveLastRenderedPageBreak = true,
                RemovePermissions = true,
                RemoveProof = true,
                RemoveRsidInfo = true,
                RemoveSmartTags = true,
                RemoveSoftHyphens = true,
                ReplaceTabsWithSpaces = true
            };
            MarkupSimplifier.SimplifyMarkup(doc, settings);
        }

这将简化Open Xml文档的标记,使进一步的转换更容易以编程方式处理文档。我总是在以编程方式使用打开的xml文档之前使用它。

有关使用这些工具的更多信息,请参阅here和一篇好的博客文章here