Question

我想从.docx文件中的子表中读取特定文本。是否有一种有效的方法，如xpath遍历或java中支持的类似api。

目前我尝试使用java apache poi（下面的代码片段）阅读.docx，但这样我必须根据标签'w：tr'迭代所有节点并读取节点文本值。有没有办法快速检索基于像xpath这样的搜索模式所需的数据。。任何投入都受到高度赞赏。

              File myFile = new File( "D:\\XLS-Pages\\TestSherwin.docx" );
              ZipFile docxFile = new ZipFile( myFile );
        ZipEntry documentXML = docxFile.getEntry( "word/document.xml" );
        InputStream documentXMLIS = docxFile.getInputStream( documentXML );
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        org.w3c.dom.Document doc = dbf.newDocumentBuilder().parse( documentXMLIS );

        org.w3c.dom.Element tElement = doc.getDocumentElement();
        NodeList n = (NodeList) tElement.getElementsByTagName( "w:tr" );

Answer 1

您可以在docx4j中使用XPath;支持基于JAXB对XPath的支持，具有各种限制。

Xpath搜索.docx

1 个答案: