我尝试使用java api Apache POI来读取文件.docx。我用:
HttpServerRequest request = routingContext.request();
request.setExpectMultipart(true);
request.endHandler(new Handler<Void>() {
@Override
public void handle(Void aVoid) {
MultiMap entries = request.formAttributes();
Set<String> names = entries.names();
logger.info("UPLOAD_CONTENT: fileName = "+entries.get("fileName"));
logger.info("UPLOAD_CONTENT: type = "+entries.get("type"));
logger.info("UPLOAD_CONTENT: names = "+names);
request.response().setChunked(true).end(createResponse("SUCCESS"));
}
});
// This would be called multiple times
request.uploadHandler(upload -> {
upload.exceptionHandler(new Handler<Throwable>() {
@Override
public void handle(Throwable error) {
logger.error("UPLOAD_CONTENT: Error while uploading content "+upload.filename());
logger.error("UPLOAD_CONTENT: error = "+error.toString());
error.printStackTrace();
request.response().setChunked(true).end(createResponse("FAILURE"));
}
});
upload.endHandler(new Handler<Void>() {
@Override
public void handle(Void aVoid) {
logger.info("UPLOAD_CONTENT: fileName = "+upload.filename());
logger.info("UPLOAD_CONTENT: name = "+upload.name());
logger.info("UPLOAD_CONTENT: contentType = "+upload.contentType());
logger.info("UPLOAD_CONTENT: size = "+upload.size());
UtilityFunctions.uploadToS3(upload.filename(), "testfolder");
}
});
upload.streamToFileSystem(upload.filename());
});
在这种情况下,我只获取文件的文本,但我的文件包含文本,表格,图片......我如何获得文件的完整内容?
答案 0 :(得分:0)
String contents = "";
try {
System.out.println("Starting the test");
POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream("D:/Resume.doc"));
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
OutputStream file = new FileOutputStream(new File("D:/test.pdf"));
PdfWriter parser = PdfWriter.getInstance(doc, file);
parser.parse();
PDDocument pdfDocument = parser.getPDDocument();
PDFTextStripper stripper = new PDFTextStripper();
contents = stripper.getText(pdfDocument);
pdfDocument.close();
} catch (Exception e) {
logger.error(e.getMessage());
}
在contents
中,您可以获得完整的文件内容。