Question

我已使用ABBYY finereader将PDF文档转换为word文档。 XWPFTable（Apache POI）无法识别word文档中的表格。

以下是表格格式：

Heading1        Heading2       Heading3  Heading4
Sub-heading1    Sub-heading2         
2011            36.66          ABC       24,000 C
2012            46.90          ABC       78,000 C
                               ABC       90,000 D

以下是我的代码：

import java.io.FileInputStream;
import java.util.Iterator;
import java.util.List;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.usermodel.IBodyElement;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFTable;

public class TableExtraction {
  public static void main(String[] args) {
    try {
      FileInputStream fis = new FileInputStream("<path to docx file>");
      XWPFDocument xdoc=new XWPFDocument(OPCPackage.open(fis));
      Iterator<IBodyElement> bodyElementIterator = xdoc.getBodyElementsIterator();
      while(bodyElementIterator.hasNext()) {
        IBodyElement element = bodyElementIterator.next();
        if("TABLE".equalsIgnoreCase(element.getElementType().name())) {
          System.out.println("Table Data");
          List<XWPFTable> tableList =  element.getBody().getTables();
          for (XWPFTable table: tableList) {
            System.out.println("Total Number of Rows of Table:" + table.getNumberOfRows());
            System.out.println(table.getText());
          }
        }
        else {
          System.out.println("Not a Table Data"); 
        }
      }
      xdoc.close();
    }
    catch(Exception ex) {
      ex.printStackTrace();
    } 
  }
}

输出：

不是表格数据

Answer 1

我在我的Word桌面上用你的代码尝试了它，但它没有用。假设它是一个常规的Word表，你可以像这样直接迭代表：

public static void main(String[] args) throws IOException {
    FileInputStream fis = new FileInputStream(FILE_NAME);
    XWPFDocument xdoc = new XWPFDocument(fis);

    for (XWPFTable table : xdoc.getTables()) {
         System.out.println(table.getRows().size());

          //in case you want to do more with the table cells...
         for (XWPFTableRow row : table.getRows()) {
            for (XWPFTableCell cell : row.getTableCells()) {
                for (XWPFParagraph para : cell.getParagraphs()) {
                    System.out.println(para.getText());
                }
            }
        }
    }
    fis.close();
    xdoc.close();
}

如果这不起作用，则从PDF转换中可能出现问题。

XWPFTable无法识别word文档中的表

1 个答案: