我已使用ABBYY finereader将PDF文档转换为word文档。 XWPFTable(Apache POI)无法识别word文档中的表格。
以下是表格格式:
Heading1 Heading2 Heading3 Heading4
Sub-heading1 Sub-heading2
2011 36.66 ABC 24,000 C
2012 46.90 ABC 78,000 C
ABC 90,000 D
以下是我的代码:
import java.io.FileInputStream;
import java.util.Iterator;
import java.util.List;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.usermodel.IBodyElement;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFTable;
public class TableExtraction {
public static void main(String[] args) {
try {
FileInputStream fis = new FileInputStream("<path to docx file>");
XWPFDocument xdoc=new XWPFDocument(OPCPackage.open(fis));
Iterator<IBodyElement> bodyElementIterator = xdoc.getBodyElementsIterator();
while(bodyElementIterator.hasNext()) {
IBodyElement element = bodyElementIterator.next();
if("TABLE".equalsIgnoreCase(element.getElementType().name())) {
System.out.println("Table Data");
List<XWPFTable> tableList = element.getBody().getTables();
for (XWPFTable table: tableList) {
System.out.println("Total Number of Rows of Table:" + table.getNumberOfRows());
System.out.println(table.getText());
}
}
else {
System.out.println("Not a Table Data");
}
}
xdoc.close();
}
catch(Exception ex) {
ex.printStackTrace();
}
}
}
输出:
不是表格数据
答案 0 :(得分:0)
我在我的Word桌面上用你的代码尝试了它,但它没有用。假设它是一个常规的Word表,你可以像这样直接迭代表:
public static void main(String[] args) throws IOException {
FileInputStream fis = new FileInputStream(FILE_NAME);
XWPFDocument xdoc = new XWPFDocument(fis);
for (XWPFTable table : xdoc.getTables()) {
System.out.println(table.getRows().size());
//in case you want to do more with the table cells...
for (XWPFTableRow row : table.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph para : cell.getParagraphs()) {
System.out.println(para.getText());
}
}
}
}
fis.close();
xdoc.close();
}
如果这不起作用,则从PDF转换中可能出现问题。