我正在尝试替换docx文件中的单词,如here所述:
public static void SearchAndReplace(string document)
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
Regex regexText = new Regex("Hello world!");
docText = regexText.Replace(docText, "Hi Everyone!");
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
}
这项工作正常,但有时对于文档中的SomeTest,您会得到类似的内容:
<w:t>
Some
</w:t>
</w:r>
<w:r w:rsidR="009E5AFA">
<w:rPr>
<w:b/>
<w:color w:val="365F91"/>
<w:sz w:val="22"/>
</w:rPr>
<w:t>
Test
</w:t>
</w:r>
当然替换失败了。也许有一种解决方法可以在docx中使一些单词牢不可破?或许我正在替换错误?
答案 0 :(得分:3)
解决此问题的一种方法是在进行转换之前规范化文档的xml。您可以使用OpenXml Powertools来执行此操作。
规范化xml的示例代码
using (WordprocessingDocument doc =
WordprocessingDocument.Open("Test.docx", true))
{
SimplifyMarkupSettings settings = new SimplifyMarkupSettings
{
NormalizeXml = true, // Merges Run's in a paragraph with similar formatting
// Additional settings if required
AcceptRevisions = true,
RemoveBookmarks = true,
RemoveComments = true,
RemoveGoBackBookmark = true,
RemoveWebHidden = true,
RemoveContentControls = true,
RemoveEndAndFootNotes = true,
RemoveFieldCodes = true,
RemoveLastRenderedPageBreak = true,
RemovePermissions = true,
RemoveProof = true,
RemoveRsidInfo = true,
RemoveSmartTags = true,
RemoveSoftHyphens = true,
ReplaceTabsWithSpaces = true
};
MarkupSimplifier.SimplifyMarkup(doc, settings);
}
这将简化Open Xml文档的标记,使进一步的转换更容易以编程方式处理文档。我总是在以编程方式使用打开的xml文档之前使用它。