如何使用C#从word doc中的表中读取值

时间:2017-08-02 16:31:38

标签: c# openxml-sdk

我正在尝试连接到Microsoft Word文档(.docx)以读取位于.docx中的表中的值。我正在使用Open-XML SDK 2.0建立与.docx文件的连接。在寻找示例和想法之后到目前为止,我有了这个,

public static string TextFromWord(string file)
  const string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";

    StringBuilder textBuilder = new StringBuilder();
    using (WordprocessingDocument wDoc = WordprocessingDocument.Open(filename,false))
          //Manage namespaces to perform Xpath queries
          NameTable nt = new NameTable();
          XmlNamespaceManager nsManger = new XmlNamespaceManger(nt);
          nsManager.AddNamespace("w", wordmlNamespace);

          //Get the document part from the package.
          //Load the XML in the document part into an XmlDocument instance.
          XmlDocument xdoc = new XmlDocument(nt);

          XmlNodeList paragraphNodes = xdoc.SelectNodes("//w:p", nsManager);
          foreach (XmlNode paragraphNode in paragraphNodes)
            XmlNodeList textNodes = paragraphNode.SelectNodes(".//w:t", nsmanager);
            foreach (System.Xml.XmlNode textNode in textNodes)

      return textBuilder.ToString();


1 个答案:

答案 0 :(得分:3)

尝试以下简单的重写方法。它用OpenXML elements (Document, Body, Paragraph, Table, Row, Cell, Descendants, etc)替换您的System.XML调用和命名空间项。请install and use the OpenXML 2.5 SDK

    public static string TextFromWord(string filename)
        StringBuilder textBuilder = new StringBuilder();
        using (WordprocessingDocument wDoc = WordprocessingDocument.Open(filename, false))
            var parts = wDoc.MainDocumentPart.Document.Descendants().FirstOrDefault();
            if (parts != null)
                foreach (var node in parts.ChildElements)
                    if(node is Paragraph)
                        ProcessParagraph((Paragraph)node, textBuilder);

                    if (node is Table)
                        ProcessTable((Table)node, textBuilder);
        return textBuilder.ToString();

    private static void ProcessTable(Table node, StringBuilder textBuilder)
        foreach (var row in node.Descendants<TableRow>())
            textBuilder.Append("| ");
            foreach (var cell in row.Descendants<TableCell>())
                foreach (var para in cell.Descendants<Paragraph>())
                    ProcessParagraph(para, textBuilder);
                textBuilder.Append(" | ");

    private static void ProcessParagraph(Paragraph node, StringBuilder textBuilder)
        foreach(var text in node.Descendants<Text>())

注意 - 此代码仅适用于由段落和表格组成的简单Word文档。此代码尚未在复杂的Word文档中进行测试。


enter image description here


enter image description here