将MS Word表格单元格提取为图像?

时间:2016-06-22 07:55:16

标签: java apache-poi

我需要将表格单元格提取为图像。单元格可能包含混合内容(文本+图像),我需要将其合并到单个图像中。我能够获得核心文本,但我不知道如何获得图像+文本。  不确定Apace POI是否会有所帮助。

有人之前做过这样的事吗?

  public static void readTablesDataInDocx(XWPFDocument doc) {
    int tableIdx = 1;
    int rowIdx = 1;
    int colIdx = 1;
    List table = doc.getTables();
    System.out.println("==========No Of Tables in Document=============================================" + table.size());
    for (int k = 0; k < table.size(); k++) {
        XWPFTable xwpfTable = (XWPFTable) table.get(k);
        System.out.println("================table -" + tableIdx + "===Data==");
        rowIdx = 1;
        List row = xwpfTable.getRows();
        for (int j = 0; j < row.size(); j++) {
            XWPFTableRow xwpfTableRow = (XWPFTableRow) row.get(j);
            System.out.println("Row -" + rowIdx);
            colIdx = 1;
            List cell = xwpfTableRow.getTableCells();
            for (int i = 0; i < cell.size(); i++) {
                XWPFTableCell xwpfTableCell = (XWPFTableCell) cell.get(i);
                if (xwpfTableCell != null) {
                    System.out.print("\t" + colIdx + "- column value: " + xwpfTableCell.getText());
                }
                colIdx++;
            }
            System.out.println("");
            rowIdx++;
        }
        tableIdx++;
        System.out.println("");
    }
}

现在,我可以借助此方法获取文本

System.out.print("\t" + colIdx + "- column value: " + xwpfTableCell.getText());

如果单元格还包含图像,如何获取图像?

2 个答案:

答案 0 :(得分:4)

试试这段代码,它对我有用

 XWPFDocument doc = new XWPFDocument(new FileInputStream(fileName));
            List<XWPFTable> table = doc.getTables();
            for (XWPFTable xwpfTable : table) {
                List<XWPFTableRow> row = xwpfTable.getRows();
                for (XWPFTableRow xwpfTableRow : row) {
                    List<XWPFTableCell> cell = xwpfTableRow.getTableCells();
                    for (XWPFTableCell xwpfTableCell : cell) {
                        if (xwpfTableCell != null) {
                            System.out.println(xwpfTableCell.getText());
                            String s = xwpfTableCell.getText();
                            for (XWPFParagraph p : xwpfTableCell.getParagraphs()) {
                                for (XWPFRun run : p.getRuns()) {
                                    for (XWPFPicture pic : run.getEmbeddedPictures()) {
                                        byte[] pictureData = pic.getPictureData().getData();
                                        System.out.println("picture : " + pictureData);
                                    }
                                }
                            }
                        }
                    }
                }
            }

答案 1 :(得分:2)

如果您拥有Cell,则可以获取构成该单元格的paragraphs。这些段落由Run s组成,您可以通过调用getRuns方法获得这些段落。运行本身可以包含嵌入的图像,您可以通过调用getEmbeddedPictures方法获得。

因此,您可以使用一种获取单元格嵌入图片的方法:

public static void printDescriptionOfImagesInCell(XWPFTableCell cell) {
    List<XWPFParagraph> paragrahs = cell.getParagraphs();
    for (XWPFParagraph paragraph : paragraphs) {
        List<XWPFRun> runs = paragraph.getRuns();
        for (XWPFRun run : runs) {
            List<XWPFPicture> pictures = run.getEmbeddedPictures();
            for (XWPFPicture picture : pictures) {
                //Do anything you want with the picture:
                System.out.println("Picture: " + picture.getDescription());
            }
        }
    }
}

您应该能够使用Picture文档发现有关实际图片的更多信息,并更改方法以实际获取图像数据,名称等。