我正在从PDF中提取数据到excel。在那个PDF中也包含表格。我用Itext- pdf
将PDF转换为文字&在apache poi
秘密文本的帮助下取得优异成绩。但是我无法检索要存储在数据库中的数据。我尝试了PDF-BOX
,ASPOSE
也得到了同样的结果。如果有人知道,请帮我解决这个问题。
这是我的代码
// pdf to text using itext
PdfReader reader = new PdfReader(
"C:\\Users\\mohmeds\\Desktop\\BOI_SCFS banking.pdf_page_1.pdf");
PdfReaderContentParser parser = new PdfReaderContentParser(
reader);
// PrintWriter out = new PrintWriter(new FileOutputStream(txt));
TextExtractionStrategy strategy;
String line = null;
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
strategy = parser.processContent(i,
new SimpleTextExtractionStrategy());
line = strategy.getResultantText();
}
reader.close();
// using apache poi text to excel converter
org.apache.poi.ss.usermodel.Workbook wb = new HSSFWorkbook();
CreationHelper helper = wb.getCreationHelper();
Sheet sheet = wb.createSheet("new sheet");
System.out.println("link------->" + line);
List<String> lines = IOUtils.readLines(new StringReader(line));
for (int i = 0; i < lines.size(); i++) {
String str[] = lines.get(i).split(",");
Row row = sheet.createRow((short) i);
for (int j = 0; j < str.length; j++) {
row.createCell(j).setCellValue(
helper.createRichTextString(str[j]));
}
}
FileOutputStream fileOut = new FileOutputStream(
"C:\\Users\\mohmeds\\Desktop\\someName1.xls");
wb.write(fileOut);
fileOut.close();
答案 0 :(得分:0)
您的问题有点模糊,但是如果您希望将PDF中的数据存储到数据库中,则可能需要将数据提取为CSV而不是Excel。同样,此处的代码省去了将PDF转换为Text,然后将Text转换为Excel的中间步骤。定义格式时,选择“ csv”:
prop-types
https://github.com/pdftables/java-pdftables-api/blob/master/pdftables.java