Question

我试图在单词文档中提取嵌入的单词文档。我使用POIFSFileSystem来导航文档，我明白了：

Try extract embedded files....
found entry: /SummaryInformation
found entry: /DocumentSummaryInformation
found entry: /ObjectPool
found entry: /ObjectPool/_1525700337
found entry: /ObjectPool/_1525700337/CompObj
found entry: /ObjectPool/_1525700337/ObjInfo
found entry: /ObjectPool/_1525700337/Package
found entry: /ObjectPool/_1546953926
found entry: /ObjectPool/_1546953926/ObjInfo
found entry: /ObjectPool/_1546953926/Ole10Native
found entry: /ObjectPool/_1546953926/CompObj
found entry: /ObjectPool/_1549950910
found entry: /ObjectPool/_1549950910/CompObj
found entry: /ObjectPool/_1549950910/ObjInfo
found entry: /ObjectPool/_1549950910/Package
found entry: /ObjectPool/_1570293946
found entry: /ObjectPool/_1570293946/CompObj
found entry: /ObjectPool/_1570293946/ObjInfo
found entry: /ObjectPool/_1570293946/Package
found entry: /ObjectPool/_1581762906
found entry: /ObjectPool/_1581762906/CompObj
found entry: /ObjectPool/_1581762906/ObjInfo
found entry: /ObjectPool/_1581762906/Package

但是所有这些嵌入式对象都是Word文档，其中之一是TXT文件。为什么它们都是包装？

更新。

我下载了poi-bin-4.0.0-20180907.zip 但是它不包含带有XHTMLOptions类的org.apache.poi.xwpf.converter.core.jar（我需要它来将docx转换为html）。在哪里可以找到罐子？

从WordDocument中提取嵌入式Word文档（Apache POI 4.0.0）

0 个答案: