我试图在单词文档中提取嵌入的单词文档。 我使用POIFSFileSystem来导航文档,我明白了:
Try extract embedded files....
found entry: /SummaryInformation
found entry: /DocumentSummaryInformation
found entry: /ObjectPool
found entry: /ObjectPool/_1525700337
found entry: /ObjectPool/_1525700337/CompObj
found entry: /ObjectPool/_1525700337/ObjInfo
found entry: /ObjectPool/_1525700337/Package
found entry: /ObjectPool/_1546953926
found entry: /ObjectPool/_1546953926/ObjInfo
found entry: /ObjectPool/_1546953926/Ole10Native
found entry: /ObjectPool/_1546953926/CompObj
found entry: /ObjectPool/_1549950910
found entry: /ObjectPool/_1549950910/CompObj
found entry: /ObjectPool/_1549950910/ObjInfo
found entry: /ObjectPool/_1549950910/Package
found entry: /ObjectPool/_1570293946
found entry: /ObjectPool/_1570293946/CompObj
found entry: /ObjectPool/_1570293946/ObjInfo
found entry: /ObjectPool/_1570293946/Package
found entry: /ObjectPool/_1581762906
found entry: /ObjectPool/_1581762906/CompObj
found entry: /ObjectPool/_1581762906/ObjInfo
found entry: /ObjectPool/_1581762906/Package
但是所有这些嵌入式对象都是Word文档,其中之一是TXT文件。 为什么它们都是包装?
更新。
我下载了poi-bin-4.0.0-20180907.zip 但是它不包含带有XHTMLOptions类的org.apache.poi.xwpf.converter.core.jar(我需要它来将docx转换为html)。 在哪里可以找到罐子?