Java pdf到Excel转换

时间:2016-10-26 05:50:44

标签: java apache-poi pdf-reader

我正在从PDF中提取数据到excel。在那个PDF中也包含表格。我用Itext- pdf将PDF转换为文字&在apache poi秘密文本的帮助下取得优异成绩。但是我无法检索要存储在数据库中的数据。我尝试了PDF-BOXASPOSE也得到了同样的结果。如果有人知道,请帮我解决这个问题。

这是我的代码

// pdf to text using itext

            PdfReader reader = new PdfReader(
                    "C:\\Users\\mohmeds\\Desktop\\BOI_SCFS banking.pdf_page_1.pdf");
            PdfReaderContentParser parser = new PdfReaderContentParser(
                    reader);
            // PrintWriter out = new PrintWriter(new FileOutputStream(txt));
            TextExtractionStrategy strategy;
            String line = null;
            for (int i = 1; i <= reader.getNumberOfPages(); i++) {
                strategy = parser.processContent(i,
                        new SimpleTextExtractionStrategy());
                line = strategy.getResultantText();
            }
            reader.close();

            // using apache poi text to excel converter

            org.apache.poi.ss.usermodel.Workbook wb = new HSSFWorkbook();
            CreationHelper helper = wb.getCreationHelper();
            Sheet sheet = wb.createSheet("new sheet");
            System.out.println("link------->" + line);
            List<String> lines = IOUtils.readLines(new StringReader(line));

            for (int i = 0; i < lines.size(); i++) {
                String str[] = lines.get(i).split(",");
                Row row = sheet.createRow((short) i);
                for (int j = 0; j < str.length; j++) {
                    row.createCell(j).setCellValue(
                            helper.createRichTextString(str[j]));

                }
            }

            FileOutputStream fileOut = new FileOutputStream(
                    "C:\\Users\\mohmeds\\Desktop\\someName1.xls");
            wb.write(fileOut);
            fileOut.close();

1 个答案:

答案 0 :(得分:0)

您的问题有点模糊,但是如果您希望将PDF中的数据存储到数据库中,则可能需要将数据提取为CSV而不是Excel。同样,此处的代码省去了将PDF转换为Text,然后将Text转换为Excel的中间步骤。定义格式时,选择“ csv”:

prop-types

https://github.com/pdftables/java-pdftables-api/blob/master/pdftables.java