Question

Tabula看起来像是从PDF提取表格数据的绝佳工具。关于如何从命令行调用它或如何在Python中使用它的例子很多，但是似乎没有关于Java的文档。有人有可行的例子吗？

请注意，表格确实提供了源代码，但是版本之间似乎有些混淆。例如，GitHub上的示例引用了JAR中似乎不存在的TableExtractor类。

https://github.com/tabulapdf/tabula-java

Answer 1

您可以使用以下代码从Java调用表格，希望这对您有所帮助

  public static void main(String[] args) throws IOException {
    final String FILENAME="../test.pdf";

    PDDocument pd = PDDocument.load(new File(FILENAME));

    int totalPages = pd.getNumberOfPages();
    System.out.println("Total Pages in Document: "+totalPages);

    ObjectExtractor oe = new ObjectExtractor(pd);
    SpreadsheetExtractionAlgorithm sea = new SpreadsheetExtractionAlgorithm();
    Page page = oe.extract(1);

    // extract text from the table after detecting
    List<Table> table = sea.extract(page);
    for(Table tables: table) {
        List<List<RectangularTextContainer>> rows = tables.getRows();

        for(int i=0; i<rows.size(); i++) {

            List<RectangularTextContainer> cells = rows.get(i);

            for(int j=0; j<cells.size(); j++) {
                System.out.print(cells.get(j).getText()+"|");
            }

           // System.out.println();
        }
    }

}

Answer 2

// ****** Extract text from the table after detecting & TRANSFER TO XLSX *****
    XSSFWorkbook wb = new XSSFWorkbook();
    Sheet sheet = wb.createSheet("Barang Baik");
    List<Table> table = sea.extract(page);
    for (Table t : table) {
        int rowNumber = 0;
        try {
            while (sheet.getRow(rowNumber).getCell(0) != null) {
                rowNumber++;
            }
        } catch (Exception e) { }

        List<List<RectangularTextContainer>> rows = t.getRows();
        for (int i = 0; i < rows.size(); i++) {
            List<RectangularTextContainer> cells = rows.get(i);
            Row row = sheet.createRow(i+rowNumber);
            for (int j = 0; j < cells.size(); j++) {
                Cell cell = row.createCell(j);
                String cellValue = cells.get(j).getText();
                cell.setCellValue(cellValue);
            }
        }
        FileOutputStream fos = new FileOutputStream("C:\\your\\file.xlsx");
        wb.write(fos);
        fos.close();
    }

如何从Java调用表格（JAR）？

2 个答案: