使用java在谷歌应用引擎中使用apache poi从pptx中提取图像

时间:2015-04-11 03:15:51

标签: java google-app-engine apache-poi

我正在尝试使用google image api和apache poi api。也得到例外:  org.apache.poi.POIXMLException:org.apache.poi.openxml4j.exceptions.InvalidFormatException:包应包含内容类型部分[M1.13]

我的源代码如下:

import com.google.appengine.api.blobstore.BlobstoreInputStream;
import com.google.appengine.api.images.Image;
import com.google.appengine.api.images.ImagesService;
import com.google.appengine.api.images.ImagesServiceFactory;

 XMLSlideShow ppt= new XMLSlideShow(BlobstoreInputStream);

 //getting the dimensions and size of the slide 
 //Dimension pgsize = ppt.getPageSize();
 XSLFSlide[] slide = ppt.getSlides();    
 for (int i = 0; i < slide.length; i++) {

     PackagePart part= slide[i].getPackagePart();
     OutputStream outputStream = part.getOutputStream();
     ImagesService imagesService = ImagesServiceFactory.getImagesService();
     ppt.write(outputStream);
     ByteArrayOutputStream bout=(((ByteArrayOutputStream) outputStream));
     Image img = ImagesServiceFactory.makeImage(bout.toByteArray());

     /* BufferedImage img = new BufferedImage(pgsize.width, pgsize.height,BufferedImage.TYPE_INT_RGB);
    Graphics2D graphics = img.createGraphics();

    //clear the drawing area
    graphics.setPaint(Color.white);
    graphics.fill(new Rectangle2D.Float(0, 0, pgsize.width, pgsize.height));

    //render
    slide[i].draw(graphics);
      ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
  */ 

    byte[] nimg=img.getImageData();

2 个答案:

答案 0 :(得分:0)

如果您只需要PPTX中的图片,那么就不需要Apache poi。 pptx文件只是一个压缩文件,图像存储在root / media / *中,只需打开zip并读取所有图像即可。使用以下代码。

[INFO] [Engine$] Data santiy check is on.
[INFO] [Engine$] org.template.textclassification.TrainingData supports data sanity check. Performing check.

Observation 1 label: 1.0
Observation 2 label: 0.0
Observation 3 label: 0.0
Observation 4 label: 1.0
Observation 5 label: 1.0

[INFO] [Engine$] org.template.textclassification.PreparedData does not support data sanity check. Skipping check.
[WARN] [BLAS] Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
[WARN] [BLAS] Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
[INFO] [Engine$] org.template.textclassification.NBModel does not support data sanity check. Skipping check.
[INFO] [Engine$] EngineWorkflow.train completed
[INFO] [Engine] engineInstanceId=AU3g4XyhTrUUakX3xepP
[INFO] [CoreWorkflow$] Inserting persistent model
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at  java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at com.esotericsoftware.kryo.io.Output.flush(Output.java:155)
at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:36)
at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:33)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
at  com.twitter.chill.TraversableSerializer$$anonfun$write$1.apply(Traversable.scala:29)

答案 1 :(得分:0)

异常&#34; org.apache.poi.openxml4j.exceptions.InvalidFormatException:包应包含内容类型部分[M1.13]&#34;文件加密(密码保护)或文件损坏时抛出。

您应首先使用[基于XML的格式 - 解密] [1]

解密文件

[1]:https://poi.apache.org/encryption.html或确保文件没有损坏。

我希望这会有所帮助。