Question

我正在使用Apache Commons在google app引擎中上传.docx文件，如此链接中所述 File upload servlet。上传时，我还想使用Apache POI库提取文本。

如果我将其传递给POI API：

 InputStream stream = item.openStream();

我得到以下异常：

java.lang.IllegalArgumentException: Your InputStream was neither an OLE2 stream, nor an OOXML stream

public static String docx2text(InputStream is) throws Exception {
    return ExtractorFactory.createExtractor(is).getText();
}

我正在上传有效的.docx文件。如果我传递一个FileInputStream对象，POI API就可以正常工作。

FileInputStream fs=new FileInputStream(new File("C:\\docs\\mydoc.docx"));

Answer 1

我不知道POI内部实现，但我的猜测是他们需要一个可搜索的流。 servlet返回的流（以及一般的网络）是不可寻找的。

尝试阅读整个内容，然后将其包装在ByteArrayInputStream：

中

byte[] bytes = getBytes(item.openStream());
InputStream stream = new ByteArrayInputStream(bytes);

public static byte[] getBytes(InputStream is) throws IOException {
    ByteArrayOutputStream buffer = new ByteArrayOutputStream();

    int len;
    byte[] data = new byte[100000];
    while ((len = is.read(data, 0, data.length)) != -1) {
    buffer.write(data, 0, len);
    }

    buffer.flush();
    return buffer.toByteArray();
}

Answer 2

问题解决了..

    while (iterator.hasNext()) {  //Apache commons file upload code
      FileItemStream item = iterator.next();
      InputStream stream = item.openStream();
      ByteArrayInputStream bs=new ByteArrayInputStream(IOUtils.toByteArray(stream));
      POITextExtractor extractor = ExtractorFactory.createExtractor(bs); 
      System.out.println(extractor.getText());
    }

您的InputStream既不是OLE2流也不是OOXML流

2 个答案: