如何使用Java和Apache POI XWPF库从.docx
文件中提取编号和文本?
我使用以下代码:
public static void readDocxFile() {
try {
File file = new File("C:\\test.docx");
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (XWPFParagraph para : paragraphs) {
System.out.println(para.getText());
fis.close();
}
} catch (Exception e) {
e.printStackTrace();
}
}
我的代码只提取文字,如下所示:
CLIENT SERVICE SATISFACTION
Client Feedback System
Interlibrary Loans
Shelf Tidiness
Three Day Loans
Materials Availability Survey
Online help service
我需要用文本提取章节编号(编号),如下所示:
1 CLIENT SERVICE SATISFACTION
1.1 Client Feedback System
1.1.1 Interlibrary Loans
1.1.2 Shelf Tidiness
1.1.3 Three Day Loans
1.2 Materials Availability Survey
1.3 Online help service
答案 0 :(得分:0)
要获取doc文件的文本,您需要使用XWFParagraph(使用poi-ooxml API)方法。要获得该段落的编号,请尝试以下代码:
BigInteger currentParagraphNumberingID = currentPara_Line.getCTP().getPPr().getNumPr().getNumId().getVal();
BigInteger currentParagraphAbstractNumID2 = currentPara_Line.getDocument().getNumbering().getAbstractNumID(currentParagraphNumberingID);
XWPFAbstractNum currentParagraphAbstractNum = currentPara_Line.getDocument().getNumbering().getAbstractNum(currentParagraphAbstractNumID2);
CTAbstractNum currentParagraphAbstractNumFormatting = currentParagraphAbstractNum.getCTAbstractNum();