我使用Apache POI HWPF来提取.doc文件,我发现提取的文本没有章节编号,POI可以用文本提取章节号吗?
public void readDocFile() {
File docFile = null;
WordExtractor docExtractor = null;
WordExtractor exprExtractor = null;
try {
docFile = new File("C:\\Documents and Settings\\Administrator\\Desktop\\Topo6.doc");
// A FileInputStream obtains input bytes from a file.
FileInputStream fis = new FileInputStream(docFile.getAbsolutePath());
// A HWPFDocument used to read document file from FileInputStream
HWPFDocument doc = new HWPFDocument(fis);
docExtractor = new WordExtractor(doc);
} catch (Exception exep) {
System.out.println(exep.getMessage());
}
// This Array stores each line from the document file.
String text = docExtractor.getText();
System.out.println(text);
}
答案 0 :(得分:2)
好的,我明白了。
.doc文件中在office word中生成的章节号是动态的,所以我必须得到每个段落的级别,并自己计算章号。