Question

如何使用Apache poi在ms-office .doc文件中读取图像？我尝试使用以下代码，但它无法正常工作。

try {
    POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream("C:\\DATASTORE\\ImageDocument.doc"));
    Document document = new Document();
    OutputStream fileOutput = new FileOutputStream(new File("C:/DATASTORE/ImageDocumentPDF.pdf"));
    PdfWriter.getInstance(document, fileOutput);
    document.open();

    HWPFDocument hdocument=new HWPFDocument(fs);
    Range range=hdocument.getOverallRange();
    PdfPTable createTable;
    CharacterRun run;
    PicturesTable picture=hdocument.getPicturesTable();
    int picoffset=run.getPicOffset();
    for(int i=0;i<range.numParagraphs();i++) {
        run =range.getCharacterRun(i);
        if(picture.hasPicture(run)) {
            Picture pic=picture.extractPicture(run, true);
            byte[] picturearray=pic.getContent();
            com.itextpdf.text.Image image=com.itextpdf.text.Image.getInstance(picturearray);
            document.add(image);
        }
    }
}

当我执行上述代码并打印图片偏移值时，显示 -1 当打印 picture.hasPicture（run）时，虽然输入文件有图像，但它返回 false 。

请帮我找到解决方案。谢谢

Answer 1

public static List<byte[]> extractImagesFromWord(File file) {
    if (file.exists()) {
        try {
            List<byte[]> result  = new ArrayList<byte[]>();
            if ("docx".equals(getMimeType(file).getExtension())) {
                org.apache.poi.xwpf.usermodel.XWPFDocument doc = new XWPFDocument(new FileInputStream(file));
                for (org.apache.poi.xwpf.usermodel.XWPFPictureData picture : doc.getAllPictures()) {
                    result.add(picture.getData());
                }
            } else if ("doc".equals(getMimeType(file).getExtension())) {
                org.apache.poi.hwpf.HWPFDocument doc = new HWPFDocument(new FileInputStream(file));
                for (org.apache.poi.hwpf.usermodel.Picture picture : doc.getPicturesTable().getAllPictures()) {
                    result.add(picture.getContent());
                }
            }
            return result;
        } catch (Exception e) {
            throw new RuntimeException( e);
        }
    }
    return null;
}

Answer 2

它对我有用，如果picOffset返回-1，则表示当前的CharacterRun没有图像

如何读取.doc文件中的图像

2 个答案: