如何使用apache-poi获取整个文件内容?

时间:2016-09-16 09:29:06

标签: java ms-word apache-poi docx

我尝试使用java api Apache POI来读取文件.docx。我用:

    HttpServerRequest request = routingContext.request();
    request.setExpectMultipart(true);
    request.endHandler(new Handler<Void>() {
        @Override
        public void handle(Void aVoid) {
            MultiMap entries = request.formAttributes();
            Set<String> names = entries.names();
            logger.info("UPLOAD_CONTENT: fileName = "+entries.get("fileName"));
            logger.info("UPLOAD_CONTENT: type = "+entries.get("type"));
            logger.info("UPLOAD_CONTENT: names = "+names);
            request.response().setChunked(true).end(createResponse("SUCCESS"));
        }
    });
    // This would be called multiple times
    request.uploadHandler(upload -> {

        upload.exceptionHandler(new Handler<Throwable>() {
            @Override
            public void handle(Throwable error) {
                logger.error("UPLOAD_CONTENT: Error while uploading content "+upload.filename());
                logger.error("UPLOAD_CONTENT: error = "+error.toString());
                error.printStackTrace();
                request.response().setChunked(true).end(createResponse("FAILURE"));
            }
        });
        upload.endHandler(new Handler<Void>() {
            @Override
            public void handle(Void aVoid) {
                logger.info("UPLOAD_CONTENT: fileName = "+upload.filename());
                logger.info("UPLOAD_CONTENT: name = "+upload.name());
                logger.info("UPLOAD_CONTENT: contentType = "+upload.contentType());
                logger.info("UPLOAD_CONTENT: size = "+upload.size());
                UtilityFunctions.uploadToS3(upload.filename(), "testfolder");

            }
        });
        upload.streamToFileSystem(upload.filename());
    });

在这种情况下,我只获取文件的文本,但我的文件包含文本,表格,图片......我如何获得文件的完整内容?

1 个答案:

答案 0 :(得分:0)

String contents = "";

     try {  
         System.out.println("Starting the test");  
         POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream("D:/Resume.doc"));  
         HWPFDocument doc = new HWPFDocument(fs);  
         WordExtractor we = new WordExtractor(doc);  
         OutputStream file = new FileOutputStream(new File("D:/test.pdf")); 
         PdfWriter parser = PdfWriter.getInstance(doc, file);  
         parser.parse(); 
         PDDocument pdfDocument = parser.getPDDocument(); 
         PDFTextStripper stripper = new PDFTextStripper(); 
         contents = stripper.getText(pdfDocument); 
         pdfDocument.close();

     } catch (Exception e) {
        logger.error(e.getMessage());
     }

contents中,您可以获得完整的文件内容。