从Word文档获取文档摘要信息

时间:2014-11-29 15:21:07

标签: java apache ms-word apache-poi

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.poifs.filesystem.OfficeXmlFileException;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;

public class Init {

public static void main ( String[] argc ) throws IOException {
  HWPFDocument myWorkBook = new HWPFDocument(
      new POIFSFileSystem(
          new FileInputStream("/home/jashka/Лаби/6.docx")));

  System.out.println(myWorkBook.getDocumentSummaryInformation());
   }
}

错误讯息:

Exception in thread "main" org.apache.poi.poifs.filesystem.OfficeXmlFileException:
The supplied data appears to be in the Office 2007+ XML. You are calling the part
of POI that deals with OLE2 Office Documents. You need to call a different part of POI
to process this data (eg XSSF instead of HSSF)

1 个答案:

答案 0 :(得分:2)

使用XWPFDocument代替HWPFDocument。