我已经编写了一些代码来读取Word .dotx文件中的内容。在我的开发机器上一切都很好,可以读取文档内容但在服务器上代码会导致CPU使用率大幅增加,并且不会读取文档(原因不明)。
从Word .dotx文件中获取的XML示例:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing"
xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
xmlns:w10="urn:schemas-microsoft-com:office:word"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"
xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml"
xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"
xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk"
xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape"
mc:Ignorable="w14 w15 wp14"><w:body><w:p w:rsidR="00276253"
w:rsidRDefault="00276253"><w:pPr><w:pStyle w:val="Header"/>
</w:pPr></w:p><w:p......
我的方法:
public static string TextFromWordXMLFile(string path){
StringBuilder stringBuilder;
try {
using (WordprocessingDocument wordprocessingDocument =
WordprocessingDocument.Open(path, false)) {
NameTable nameTable = new NameTable();
XmlNamespaceManager xmlNamespaceManager =
new XmlNamespaceManager(nameTable);
xmlNamespaceManager.AddNamespace(
"w", "http://schemas.openxmlformats.org/wordprocessingml/2006/main");
string wordprocessingDocumentText;
using (StreamReader streamReader = new StreamReader(wordprocessingDocument
.MainDocumentPart.GetStream())){
wordprocessingDocumentText = streamReader.ReadToEnd();
}
stringBuilder = new StringBuilder(wordprocessingDocumentText.Length);
XmlDocument xmlDocument = new XmlDocument(nameTable);
xmlDocument.LoadXml(wordprocessingDocumentText);
XmlNodeList paragraphNodes = xmlDocument.SelectNodes("//w:p",
xmlNamespaceManager);
foreach (XmlNode paragraphNode in paragraphNodes) {
XmlNodeList textNodes = paragraphNode.SelectNodes(".//w:t | .//w:tab | .//w:br",
xmlNamespaceManager);
foreach (XmlNode textNode in textNodes) {
switch (textNode.Name) {
case "w:t":
stringBuilder.Append(textNode.InnerText);
break;
case "w:tab":
stringBuilder.Append("\t");
break;
case "w:br":
stringBuilder.Append("\v");
break;
}
}
stringBuilder.Append(Environment.NewLine);
if (stringBuilder.Length > Length) break;
}
}
return stringBuilder.ToString();
} catch (IOException) {
// This is likely to be that the file is in use because
// we're currently already opening it in word: ignore.
}
return "";
}
任何帮助都会非常感激,因为我非常喜欢这个......
谢谢,
瑞恩。