使用apache POI将doc文件转换为txt时获取java.lang.IndexOutOfBoundsException

时间:2014-01-08 10:06:48

标签: apache text apache-poi indexoutofboundsexception doc

我正在使用apache poi实用程序(poi-scratchpad-3.9.jar和相关的3.9版POI jar)将doc文件转换为txt.it正在处理大多数文件,但我得到一个例外,如下所示< / p>

java.lang.IndexOutOfBoundsException: 0 not accessible in a list of length 0
at org.apache.poi.util.IntList.get(IntList.java:346)
at org.apache.poi.poifs.storage.BlockAllocationTableReader.fetchBlocks(BlockAllocationTableReader.java:224)
at org.apache.poi.poifs.storage.BlockListImpl.fetchBlocks(BlockListImpl.java:123)
at org.apache.poi.poifs.storage.SmallDocumentBlockList.fetchBlocks(SmallDocumentBlockList.java:30)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.processProperties(POIFSFileSystem.java:521)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:163)
at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:106)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:174)

守则正在关注

fileInputStream = new FileInputStream(file.getAbsolutePath());

// A HWPFDocument used to read document file from FileInputStream
HWPFDocument doc = new HWPFDocument(fileInputStream);

// A WordExtractor used to read textual content from document
WordExtractor docExtractor = new WordExtractor(doc);

// This Array stores each line from the document file.
String[] docArray = docExtractor.getParagraphText();
StringBuilder contents = new StringBuilder();
for (int i = 0; i < docArray.length; i++) {
    if (docArray[i] != null) {
        contents.append(docArray[i]);
        contents.append(System.getProperty("line.separator"));
    }
}
isConverted = FileDirectoryOperations.writeTextOutputFile(targetFilePath, contents.toString());

我们在第HWPFDocument doc = new HWPFDocument(fileInputStream);

时遇到异常

我们是否有任何解决方法。

请分享您的意见。

提前致谢。

Sourabh

1 个答案:

答案 0 :(得分:0)

您获得的异常表明存在底层OLE2容器的结构方式。

对于OLE2结构,较旧的POIFSFileSystem比较新的({只读)NPOIFSFileSystem更挑剔,所以你应该尝试切换到那个。您的设置代码将是:

NPOIFSFileSystem fs = new NPOIFSFileSystem(file);
HWPFDocument doc = new HWPFDocument(fs.getRoot());
WordExtractor docExtractor = new WordExtractor(doc);

作为奖励,NPOIFSFileSystem也稍微更快,内存更低