我正在尝试从数据库中检索docx并尝试通过检查其内容来处理它。我认为mycode检索到我想要的文件,但似乎我还没有完全理解APACHE POI。我的堆栈跟踪错误说我错了POI有什么想法吗?
以下是我加载文件的方式:
public void loadFile(String FileName)
{
InputStream is = null;
try
{
//Connecting to MYSQL Database
Class.forName(driver).newInstance();
con = DriverManager.getConnection(url+dbName,userName,password);
Statement stmt = (Statement) con.createStatement();
ResultSet rs = stmt.executeQuery("SELECT FILE FROM doccompfiles WHERE FileName = '"+ FileName +"'");
while(rs.next())
{
is = rs.getBinaryStream("FILE");
}
HWPFDocument doc = new HWPFDocument(is);
WordExtractor we = new WordExtractor(doc);
String[] paragraphs = we.getParagraphText();
JOptionPane.showMessageDialog(null, "Number of Paragraphs" + paragraphs.length);
con.close();
}
catch(Exception ex)
{
ex.printStackTrace();
}
}
堆栈跟踪:
org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:131)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:104)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:138)
at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:106)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:174)
at documentComparisor.Database.loadFile(Database.java:156)
at documentComparisor.Home$5.actionPerformed(Home.java:195)
at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown Source)
at java.awt.Component.processMouseEvent(Unknown Source)
at javax.swing.JComponent.processMouseEvent(Unknown Source)
at java.awt.Component.processEvent(Unknown Source)
at java.awt.Container.processEvent(Unknown Source)
at java.awt.Component.dispatchEventImpl(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Window.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.EventQueue.dispatchEventImpl(Unknown Source)
at java.awt.EventQueue.access$000(Unknown Source)
at java.awt.EventQueue$3.run(Unknown Source)
at java.awt.EventQueue$3.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source)
at java.awt.EventQueue$4.run(Unknown Source)
at java.awt.EventQueue$4.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source)
at java.awt.EventQueue.dispatchEvent(Unknown Source)
at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.run(Unknown Source)
答案 0 :(得分:5)
正如您应该知道的那样,目前MS Office文档以两种不同的格式存在:一种是2007年之前MS Office版本使用的旧格式(例如“.doc”或“.xls”),另一种是新版本使用的基于XML的格式(例如“.docx”或“.xlsx”)。
Apache POI中有不同的部分可以处理不同的格式。处理旧MS Office格式文件的密钥类名称通常以“H”开头,使用基于XML格式的文件的类名称以“X”开头。
因此,在处理新格式的示例中,您应该使用XWPFDocument而不是HWPFDocument:
XWPFDocument doc = new XWPFDocument(is);