如何在java中阅读doc和docx

时间:2012-07-05 11:01:07

标签: java android apache-poi

首先你应该知道我已经研究了很多问题而且没有一个能帮助我。 我希望能够阅读doc和docx文档(当我说阅读时,我的意思是最简单的事情,只阅读文本)。 我看到一些关于poi和暂存器的帖子,但我无法让它正常工作,而且大多数时候eclipse甚至无法构建我的项目......

有人可以为我提供doc和docx的代码示例,并告诉我需要使用的所有jar的名称(或链接)吗?

谢谢!

基本上这是代码:

try {
    if (getFileExtention(path).equals("docx")) {
        FileInputStream fis = new FileInputStream(path);
        XWPFWordExtractor oleTextExtractor =
            new XWPFWordExtractor(new XWPFDocument(fis));
        return oleTextExtractor.getText();
    } else if (getFileExtention(path).equals("doc")) {
        FileInputStream fis = new FileInputStream(path);
        WordExtractor we = new WordExtractor(fis);
        return we.getText();
    }
} catch (FileNotFoundException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}


return "";

我有以下罐子:

DOM4J-1.6.1.jar

POI-3.8-20120326.jar

POI-OOXML-3.8-20120326.jar

POI暂存器-3.8-20120326.jar

的xmlbeans-xmlpublic-2.4.0.jar

我有以下问题:

这个在构建期间多次出现

> [2012-07-05 14:12:53 - iCards] Dx warning: Ignoring InnerClasses
> attribute for an anonymous inner class
> (org.dom4j.xpath.DefaultXPath$1) that doesn't come with an associated
> EnclosingMethod attribute. This class was probably produced by a
> compiler that did not target the modern .class file format. The
> recommended solution is to recompile the class from source, using an
> up-to-date compiler and without specifying any "-target" type options.
> The consequence of ignoring this warning is that reflective operations
> on this class will incorrectly indicate that it is *not* an inner
> class.

另一个:(当试图阅读docx时)

> 07-05 14:17:13.245: W/System.err(4339): java.io.IOException: read
> failed: EBADF (Bad file number) 07-05 14:17:13.255:
> W/System.err(4339):   at libcore.io.IoBridge.read(IoBridge.java:432)
> 07-05 14:17:13.260: W/System.err(4339):   at
> java.io.FileInputStream.read(FileInputStream.java:179) 07-05
> 14:17:13.265: W/System.err(4339):     at
> java.io.PushbackInputStream.read(PushbackInputStream.java:196) 07-05
> 14:17:13.270: W/System.err(4339):     at
> libcore.io.Streams.readFully(Streams.java:81) 07-05 14:17:13.275:
> W/System.err(4339):   at
> java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:230)
> 07-05 14:17:13.280: W/System.err(4339):   at
> org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:51)
> 07-05 14:17:13.285: W/System.err(4339):   at
> org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:83)
> 07-05 14:17:13.290: W/System.err(4339):   at
> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:228)
> 07-05 14:17:13.295: W/System.err(4339):   at
> org.apache.poi.util.PackageHelper.open(PackageHelper.java:39) 07-05
> 14:17:13.300: W/System.err(4339):     at
> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:120)
> 07-05 14:17:13.305: W/System.err(4339):   at
> com.ronEven.iCards.AddRemove.loadFile(AddRemove.java:504) 07-05
> 14:17:13.310: W/System.err(4339):     at
> com.ronEven.iCards.AddRemove.showDoc(AddRemove.java:495) 07-05
> 14:17:13.315: W/System.err(4339):     at
> com.ronEven.iCards.AddRemove.setFilePath(AddRemove.java:492) 07-05
> 14:17:13.320: W/System.err(4339):     at
> com.ronEven.iCards.FileDialog$1.onClick(FileDialog.java:177) 07-05
> 14:17:13.325: W/System.err(4339):     at
> android.view.View.performClick(View.java:3591) 07-05 14:17:13.330:
> W/System.err(4339):   at
> android.view.View$PerformClick.run(View.java:14263) 07-05
> 14:17:13.335: W/System.err(4339):     at
> android.os.Handler.handleCallback(Handler.java:605) 07-05
> 14:17:13.340: W/System.err(4339):     at
> android.os.Handler.dispatchMessage(Handler.java:92) 07-05
> 14:17:13.345: W/System.err(4339):     at
> android.os.Looper.loop(Looper.java:137) 07-05 14:17:13.345:
> W/System.err(4339):   at
> android.app.ActivityThread.main(ActivityThread.java:4507) 07-05
> 14:17:13.345: W/System.err(4339):     at
> java.lang.reflect.Method.invokeNative(Native Method) 07-05
> 14:17:13.350: W/System.err(4339):     at
> java.lang.reflect.Method.invoke(Method.java:511) 07-05 14:17:13.350:
> W/System.err(4339):   at
> com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:790)
> 07-05 14:17:13.350: W/System.err(4339):   at
> com.android.internal.os.ZygoteInit.main(ZygoteInit.java:557) 07-05
> 14:17:13.350: W/System.err(4339):     at
> dalvik.system.NativeStart.main(Native Method) 07-05 14:17:13.355:
> W/System.err(4339): Caused by: libcore.io.ErrnoException: read failed:
> EBADF (Bad file number) 07-05 14:17:13.360: W/System.err(4339):   at
> libcore.io.Posix.readBytes(Native Method) 07-05 14:17:13.360:
> W/System.err(4339):   at libcore.io.Posix.read(Posix.java:118) 07-05
> 14:17:13.360: W/System.err(4339):     at
> libcore.io.BlockGuardOs.read(BlockGuardOs.java:149) 07-05
> 14:17:13.360: W/System.err(4339):     at
> libcore.io.IoBridge.read(IoBridge.java:422) 07-05 14:17:13.365:
> W/System.err(4339):   ... 24 more

最后一次尝试阅读doc

    07-05 14:17:37.015: W/System.err(4339): java.io.IOException: read failed: EBADF (Bad file number)
07-05 14:17:37.020: W/System.err(4339):     at libcore.io.IoBridge.read(IoBridge.java:432)
07-05 14:17:37.025: W/System.err(4339):     at java.io.FileInputStream.read(FileInputStream.java:179)
07-05 14:17:37.055: W/System.err(4339):     at java.io.PushbackInputStream.read(PushbackInputStream.java:196)
07-05 14:17:37.055: W/System.err(4339):     at java.io.InputStream.read(InputStream.java:163)
07-05 14:17:37.060: W/System.err(4339):     at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:95)
07-05 14:17:37.065: W/System.err(4339):     at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:53)
07-05 14:17:37.070: W/System.err(4339):     at com.ronEven.iCards.AddRemove.loadFile(AddRemove.java:509)
07-05 14:17:37.075: W/System.err(4339):     at com.ronEven.iCards.AddRemove.showDoc(AddRemove.java:495)
07-05 14:17:37.085: W/System.err(4339):     at com.ronEven.iCards.AddRemove.setFilePath(AddRemove.java:492)
07-05 14:17:37.090: W/System.err(4339):     at com.ronEven.iCards.FileDialog$1.onClick(FileDialog.java:177)
07-05 14:17:37.095: W/System.err(4339):     at android.view.View.performClick(View.java:3591)
07-05 14:17:37.100: W/System.err(4339):     at android.view.View$PerformClick.run(View.java:14263)
07-05 14:17:37.105: W/System.err(4339):     at android.os.Handler.handleCallback(Handler.java:605)
07-05 14:17:37.110: W/System.err(4339):     at android.os.Handler.dispatchMessage(Handler.java:92)
07-05 14:17:37.115: W/System.err(4339):     at android.os.Looper.loop(Looper.java:137)
07-05 14:17:37.120: W/System.err(4339):     at android.app.ActivityThread.main(ActivityThread.java:4507)
07-05 14:17:37.120: W/System.err(4339):     at java.lang.reflect.Method.invokeNative(Native Method)
07-05 14:17:37.125: W/System.err(4339):     at java.lang.reflect.Method.invoke(Method.java:511)
07-05 14:17:37.125: W/System.err(4339):     at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:790)
07-05 14:17:37.130: W/System.err(4339):     at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:557)
07-05 14:17:37.130: W/System.err(4339):     at dalvik.system.NativeStart.main(Native Method)
07-05 14:17:37.130: W/System.err(4339): Caused by: libcore.io.ErrnoException: read failed: EBADF (Bad file number)
07-05 14:17:37.150: W/System.err(4339):     at libcore.io.Posix.readBytes(Native Method)
07-05 14:17:37.160: W/System.err(4339):     at libcore.io.Posix.read(Posix.java:118)
07-05 14:17:37.160: W/System.err(4339):     at libcore.io.BlockGuardOs.read(BlockGuardOs.java:149)
07-05 14:17:37.160: W/System.err(4339):     at libcore.io.IoBridge.read(IoBridge.java:422)
07-05 14:17:37.165: W/System.err(4339):     ... 20 more

3 个答案:

答案 0 :(得分:3)

Tika支持Microsoft Office格式以及许多其他格式,它为您提供了所有格式的通用界面,并隐藏了维护和学习如何使用大量不同库的复杂性。这就像调用function一样简单。您也可以直接使用Office ParserOOXMLParser

答案 1 :(得分:0)

您还拥有非常强大的应用程序,例如LibreOffice SDK(或OpenOffice 3),您可以在其中阅读和管理文档(例如docx)并以.txt格式保存它们。

答案 2 :(得分:0)

  • 要阅读 DOCX 文档,我们可以将 XWPFWordExtractor XWPFDocument 一起使用。
  • 要阅读 DOC 文档,我们可以将 WordExtractor HWPFDocument 一起使用。

您获得了DOCX文档的代码:

XWPFWordExtractor oleTextExtractor = new XWPFWordExtractor(new XWPFDocument(fis));

但您的DOC文档代码中缺少HWPFDocument。只需更改此行:

WordExtractor we = new WordExtractor(fis);

进入这个:

WordExtractor we = new WordExtractor(new HWPFDocument(fis));

关于jar文件,构建路径中似乎只缺少poi-ooxml-schemas-3.8-20120326.jar。