Question

如何使用Java和Apache POI XWPF库从.docx文件中提取编号和文本？

我使用以下代码：

public static void readDocxFile() {

    try {
        File file = new File("C:\\test.docx");
        FileInputStream fis = new FileInputStream(file.getAbsolutePath());
        XWPFDocument document = new XWPFDocument(fis);
        List<XWPFParagraph> paragraphs = document.getParagraphs();

        for (XWPFParagraph para : paragraphs) {
            System.out.println(para.getText());

            fis.close();
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}

我的代码只提取文字，如下所示：

CLIENT SERVICE SATISFACTION
Client Feedback System
Interlibrary Loans
Shelf Tidiness
Three Day Loans
Materials Availability Survey
Online help service

我需要用文本提取章节编号（编号），如下所示：

1    CLIENT SERVICE SATISFACTION
1.1   Client Feedback System
1.1.1 Interlibrary Loans
1.1.2 Shelf Tidiness
1.1.3 Three Day Loans
1.2   Materials Availability Survey
1.3   Online help service

Answer 1

要获取doc文件的文本，您需要使用XWFParagraph（使用poi-ooxml API）方法。要获得该段落的编号，请尝试以下代码：

BigInteger currentParagraphNumberingID = currentPara_Line.getCTP().getPPr().getNumPr().getNumId().getVal(); 
BigInteger currentParagraphAbstractNumID2 = currentPara_Line.getDocument().getNumbering().getAbstractNumID(currentParagraphNumberingID);
XWPFAbstractNum currentParagraphAbstractNum = currentPara_Line.getDocument().getNumbering().getAbstractNum(currentParagraphAbstractNumID2); 
CTAbstractNum currentParagraphAbstractNumFormatting = currentParagraphAbstractNum.getCTAbstractNum();

如何从.docx文件中提取编号和文本

1 个答案: