我正在使用apache poi实用程序(poi-scratchpad-3.9.jar和相关的3.9版POI jar)将doc文件转换为txt.it正在处理大多数文件,但我得到一个例外,如下所示< / p>
java.lang.IndexOutOfBoundsException: 0 not accessible in a list of length 0
at org.apache.poi.util.IntList.get(IntList.java:346)
at org.apache.poi.poifs.storage.BlockAllocationTableReader.fetchBlocks(BlockAllocationTableReader.java:224)
at org.apache.poi.poifs.storage.BlockListImpl.fetchBlocks(BlockListImpl.java:123)
at org.apache.poi.poifs.storage.SmallDocumentBlockList.fetchBlocks(SmallDocumentBlockList.java:30)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.processProperties(POIFSFileSystem.java:521)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:163)
at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:106)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:174)
守则正在关注
fileInputStream = new FileInputStream(file.getAbsolutePath());
// A HWPFDocument used to read document file from FileInputStream
HWPFDocument doc = new HWPFDocument(fileInputStream);
// A WordExtractor used to read textual content from document
WordExtractor docExtractor = new WordExtractor(doc);
// This Array stores each line from the document file.
String[] docArray = docExtractor.getParagraphText();
StringBuilder contents = new StringBuilder();
for (int i = 0; i < docArray.length; i++) {
if (docArray[i] != null) {
contents.append(docArray[i]);
contents.append(System.getProperty("line.separator"));
}
}
isConverted = FileDirectoryOperations.writeTextOutputFile(targetFilePath, contents.toString());
我们在第HWPFDocument doc = new HWPFDocument(fileInputStream);
行
我们是否有任何解决方法。
请分享您的意见。
提前致谢。
Sourabh
答案 0 :(得分:0)
您获得的异常表明存在底层OLE2容器的结构方式。
对于OLE2结构,较旧的POIFSFileSystem
比较新的({只读)NPOIFSFileSystem
更挑剔,所以你应该尝试切换到那个。您的设置代码将是:
NPOIFSFileSystem fs = new NPOIFSFileSystem(file);
HWPFDocument doc = new HWPFDocument(fs.getRoot());
WordExtractor docExtractor = new WordExtractor(doc);
作为奖励,NPOIFSFileSystem也稍微更快,内存更低