如何用文本提取.doc文件中的章节号?

时间:2010-08-31 03:11:57

标签: java apache-poi

我使用Apache POI HWPF来提取.doc文件,我发现提取的文本没有章节编号,POI可以用文本提取章节号吗?

public void readDocFile() {
    File docFile = null;
    WordExtractor docExtractor = null;
    WordExtractor exprExtractor = null;
    try {
        docFile = new File("C:\\Documents and Settings\\Administrator\\Desktop\\Topo6.doc");
        // A FileInputStream obtains input bytes from a file.
        FileInputStream fis = new FileInputStream(docFile.getAbsolutePath());

        // A HWPFDocument used to read document file from FileInputStream
        HWPFDocument doc = new HWPFDocument(fis);
        docExtractor = new WordExtractor(doc);
    } catch (Exception exep) {
        System.out.println(exep.getMessage());
    }

    // This Array stores each line from the document file.
    String text = docExtractor.getText();
    System.out.println(text);


}

1 个答案:

答案 0 :(得分:2)

好的,我明白了。

.doc文件中在office word中生成的章节号是动态的,所以我必须得到每个段落的级别,并自己计算章号。