从word apache.poi库中提取表

时间:2015-01-15 07:52:09

标签: java ms-word apache-poi

  

线程中的异常" main" org.apache.poi.poifs.filesystem.OfficeXmlFileException:提供的数据似乎在Office 2007+ XML中。 POI仅支持OLE2 Office文档       在org.apache.poi.poifs.storage.HeaderBlockReader。(HeaderBlockReader.java:96)       在org.apache.poi.poifs.filesystem.POIFSFileSystem。(POIFSFileSystem.java:84)       在com.TableTest.main(TableTest.java:19)

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.hwpf.usermodel.Table;
import org.apache.poi.hwpf.usermodel.TableCell;
import org.apache.poi.hwpf.usermodel.TableRow;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;

public class TableTest {

    public static void main(String args[]) throws IOException
    {
//        String fileName="D:\\New folder\\Annual.doc";

        InputStream fis=new FileInputStream("D://New folder//Annual.docx");
        POIFSFileSystem fs=new POIFSFileSystem(fis);
        HWPFDocument doc=new HWPFDocument(fs);

        Range range=doc.getRange();

        for(int i=0;i<range.numParagraphs();i++)
        {
            Paragraph par=range.getParagraph(i);
            System.out.println(par.text());
        }
        Paragraph tablePar=range.getParagraph(0);
        if(tablePar.isInTable())
        {
            Table table=range.getTable(tablePar);
            for(int rowIdx=0;rowIdx<table.numRows();rowIdx++)
            {
                TableRow row=table.getRow(rowIdx);
                System.out.println("row "+(rowIdx+1)+",is table header: "+row.isTableHeader());
                for(int colIdx=0;colIdx<row.numCells();colIdx++)
                {
                    TableCell cell=row.getCell(colIdx);
                    System.out.println("column "+(colIdx+1)+",text= "+cell.getParagraph(0).text());
                }
            }
        }
    }

}

1 个答案:

答案 0 :(得分:0)

HWPF适用于基于OLE2的.doc文件。对于.docx文件,您需要使用XWPF。

尝试XWPFDocument

XWPFDocument doc=new XWPFDocument(fs);