如何使用Apache poi在ms-office .doc文件中读取图像?我尝试使用以下代码,但它无法正常工作。
try {
POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream("C:\\DATASTORE\\ImageDocument.doc"));
Document document = new Document();
OutputStream fileOutput = new FileOutputStream(new File("C:/DATASTORE/ImageDocumentPDF.pdf"));
PdfWriter.getInstance(document, fileOutput);
document.open();
HWPFDocument hdocument=new HWPFDocument(fs);
Range range=hdocument.getOverallRange();
PdfPTable createTable;
CharacterRun run;
PicturesTable picture=hdocument.getPicturesTable();
int picoffset=run.getPicOffset();
for(int i=0;i<range.numParagraphs();i++) {
run =range.getCharacterRun(i);
if(picture.hasPicture(run)) {
Picture pic=picture.extractPicture(run, true);
byte[] picturearray=pic.getContent();
com.itextpdf.text.Image image=com.itextpdf.text.Image.getInstance(picturearray);
document.add(image);
}
}
}
当我执行上述代码并打印图片偏移值时,显示 -1 当打印 picture.hasPicture(run)时,虽然输入文件有图像,但它返回 false 。
请帮我找到解决方案。 谢谢
答案 0 :(得分:2)
public static List<byte[]> extractImagesFromWord(File file) {
if (file.exists()) {
try {
List<byte[]> result = new ArrayList<byte[]>();
if ("docx".equals(getMimeType(file).getExtension())) {
org.apache.poi.xwpf.usermodel.XWPFDocument doc = new XWPFDocument(new FileInputStream(file));
for (org.apache.poi.xwpf.usermodel.XWPFPictureData picture : doc.getAllPictures()) {
result.add(picture.getData());
}
} else if ("doc".equals(getMimeType(file).getExtension())) {
org.apache.poi.hwpf.HWPFDocument doc = new HWPFDocument(new FileInputStream(file));
for (org.apache.poi.hwpf.usermodel.Picture picture : doc.getPicturesTable().getAllPictures()) {
result.add(picture.getContent());
}
}
return result;
} catch (Exception e) {
throw new RuntimeException( e);
}
}
return null;
}
答案 1 :(得分:0)
它对我有用,如果picOffset
返回-1,则表示当前的CharacterRun没有图像