将PDF转换为纯文本时转换API NPE

时间:2012-08-20 19:31:42

标签: java google-app-engine

我正在编写一些代码,使用GAE的转换API对PDF进行验证。目前,我在运行" ConversionResult result = service.convert(conv);"时收到空指针错误。我已经尝试复制教程中包含的代码(https://developers.google.com/appengine/docs/java/conversion/overview),但我不得不搞砸从BlobStore获取Asset对象。我已粘贴下面的代码和堆栈跟踪;什么想法可能是错的?我试过尝试捕捉,但这只会导致方法安静地失败。此外,我无法使用内置的错误代码方法,因为没有创建ConversionResult对象。我一直在寻找解决这个问题的方法,虽然我发现了一些类似问题的帖子,但我还没有找到任何解决方案。此外,每个人似乎都在使用我上面链接的相同示例代码;谷歌真的没有关于转换的更多文档吗?

感谢您的帮助!

public static void parse(String key, BlobKey bkey) throws IOException {

        BlobstoreInputStream in = new BlobstoreInputStream(bkey);
        byte[] attachmentData = IOUtils.toByteArray(in);
        in.read(attachmentData);

        System.out.println(attachmentData.toString());
        System.out.println(attachmentData.length);
        System.out.println("parse(): blob fetched");

        //Prep for conversion
        Asset fileAsset = new Asset("application/pdf", attachmentData);
        System.out.println(fileAsset.getData().toString());
        Document pdfDoc = new Document(fileAsset);

        ConversionOptions options = ConversionOptions.Builder
                .withOcrInputLanguage("en");
        Conversion conv = new Conversion(pdfDoc, "text/plain", options);
        if (conv.equals(null))
            System.out.println("Conversion is null!");

        // Actual conversion (takes a while!)
        ConversionService service = ConversionServiceFactory.getConversionService();
        if (service.equals(null))
            System.out.println("Service is null!");
        // Fails below!!
        // ¯\(°_o)/¯
        ConversionResult result = service.convert(conv);
        ConversionErrorCode err = result.getErrorCode();
        System.out.println(err.toString());

        // Check for success, return conversion as String
        System.out.println("parse(): 7");
        if (result.success()) {
            System.out.println("parse(): 8");
            // Usually, there will only be 1 asset, but running it through a for loop to be sure
            System.out.println("parse(): 9");
            for (Asset asset : result.getOutputDoc().getAssets()) {
                System.out.println("parse(): 10");
                String text = new String(asset.getData());
                System.out.println("parse(): 11");
                System.out.println(text);

            }
        }
        else { //PDF not converted
            System.out.println("Error: PDF not converted");
            // Maybe add more error handling

        }
}

警告

2012年8月20日下午3:10:55 com.google.apphosting.utils.jetty.JettyLogger警告

  

警告:/verifyPDF.jsp   显示java.lang.NullPointerException       在com.google.appengine.api.conversion.ConversionServicePb $ AssetInfo $ Builder.setName(ConversionServicePb.java:886)       在com.google.appengine.api.conversion.AssetProtoConverter.doForward(AssetProtoConverter.java:30)       在com.google.appengine.api.conversion.AssetProtoConverter.doForward(AssetProtoConverter.java:17)       在com.google.appengine.repackaged.com.google.common.base.Converter.convert(Converter.java:52)       在com.google.appengine.api.conversion.DocumentProtoConverter.doForward(DocumentProtoConverter.java:33)       在com.google.appengine.api.conversion.DocumentProtoConverter.doForward(DocumentProtoConverter.java:18)       在com.google.appengine.repackaged.com.google.common.base.Converter.convert(Converter.java:52)       在com.google.appengine.api.conversion.ConversionProtoConverter.doForward(ConversionProtoConverter.java:38)       在com.google.appengine.api.conversion.ConversionProtoConverter.doForward(ConversionProtoConverter.java:16)       在com.google.appengine.repackaged.com.google.common.base.Converter.convert(Converter.java:52)       在com.google.appengine.api.conversion.ConversionRequestProtoConverter.doForward(ConversionRequestProtoConverter.java:40)       在com.google.appengine.api.conversion.ConversionRequestProtoConverter.doForward(ConversionRequestProtoConverter.java:19)       在com.google.appengine.repackaged.com.google.common.base.Converter.convert(Converter.java:52)       在com.google.appengine.api.conversion.ConversionServiceImpl.convertAsync(ConversionServiceImpl.java:94)       在com.google.appengine.api.conversion.ConversionServiceImpl.convert(ConversionServiceImpl.java:66)       在com.google.appengine.api.conversion.ConversionServiceImpl.convert(ConversionServiceImpl.java:59)       在coupflipsite.PDFVerify.parse(PDFVerify.java:73)       在org.apache.jsp.verifyPDF_jsp._jspService(verifyPDF_jsp.java:65)       在org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)       在javax.servlet.http.HttpServlet.service(HttpServlet.java:717)       在org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:377)       在org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)       在org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)       在com.google.appengine.tools.development.PrivilegedJspServlet.access $ 101(PrivilegedJspServlet.java:23)       在com.google.appengine.tools.development.PrivilegedJspServlet $ 2.run(PrivilegedJspServlet.java:59)       at java.security.AccessController.doPrivileged(Native Method)       在com.google.appengine.tools.development.PrivilegedJspServlet.service(PrivilegedJspServlet.java:57)       在javax.servlet.http.HttpServlet.service(HttpServlet.java:717)       在org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)       在org.mortbay.jetty.servlet.ServletHandler $ CachedChain.doFilter(ServletHandler.java:1166)       在com.google.appengine.tools.development.HeaderVerificationFilter.doFilter(HeaderVerificationFilter.java:35)       在org.mortbay.jetty.servlet.ServletHandler $ CachedChain.doFilter(ServletHandler.java:1157)       在com.google.appengine.api.blobstore.dev.ServeBlobFilter.doFilter(ServeBlobFilter.java:60)       在org.mortbay.jetty.servlet.ServletHandler $ CachedChain.doFilter(ServletHandler.java:1157)       在com.google.apphosting.utils.servlet.TransactionCleanupFilter.doFilter(TransactionCleanupFilter.java:43)       在org.mortbay.jetty.servlet.ServletHandler $ CachedChain.doFilter(ServletHandler.java:1157)       在com.google.appengine.tools.development.StaticFileFilter.doFilter(StaticFileFilter.java:125)       在org.mortbay.jetty.servlet.ServletHandler $ CachedChain.doFilter(ServletHandler.java:1157)       在com.google.appengine.tools.development.BackendServersFilter.doFilter(BackendServersFilter.java:97)       在org.mortbay.jetty.servlet.ServletHandler $ CachedChain.doFilter(ServletHandler.java:1157)       在org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)       在org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)       在org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)       在org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)       在org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)       在com.google.appengine.tools.development.DevAppEngineWebAppContext.handle(DevAppEngineWebAppContext.java:94)       在org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)       在com.google.appengine.tools.development.JettyContainerService $ ApiProxyHandler.handle(JettyContainerService.java:370)       在org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)       在org.mortbay.jetty.Server.handle(Server.java:326)       在org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)       在org.mortbay.jetty.HttpConnection $ RequestHandler.content(HttpConnection.java:938)       在org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:755)       在org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)       在org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)       在org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)       在org.mortbay.thread.QueuedThreadPool $ PoolThread.run(QueuedThreadPool.java:582)

1 个答案:

答案 0 :(得分:0)

首先:转化API为being decommissioned in Nov 2012。你应该切换到别的东西。

关于您的问题 - 尝试将“name”参数添加到Asset构造函数:

Asset fileAsset = new Asset("application/pdf", attachmentData, "filename.pdf");