我正在尝试查找docx文件中可用的文本内容的长度。我可以使用以下代码提取内容。但是当尺寸太大时,我会得到OOM异常。有更好的方法吗?
OPCPackage opcPackage = OPCPackage.open(file.getAbsolutePath());
XWPFDocument doc = new XWPFDocument(opcPackage);
XWPFWordExtractor we = new XWPFWordExtractor(doc);
String paragraphs = we.getText();
System.out.println("Total Paragraphs: "+paragraphs.length() / 1024);
我在下面的行中收到错误
XWPFDocument doc = new XWPFDocument(opcPackage);
以下是例外
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.apache.xmlbeans.impl.store.CharUtil.allocate(CharUtil.java:397)
at org.apache.xmlbeans.impl.store.CharUtil.saveChars(CharUtil.java:441)
at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.text(Cur.java:2922)
at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.attr(Cur.java:3043)
at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.attr(Cur.java:3060)
at org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Locale.java:3254)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportStartTag(Piccolo.java:1082)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(PiccoloLexer.java:1802)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseOpenTagNS(PiccoloLexer.java:1521)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseTagNS(PiccoloLexer.java:1362)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseXMLNS(PiccoloLexer.java:1293)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseXML(PiccoloLexer.java:1261)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:4808)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)
at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3439)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1270)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1257)
at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse
(SchemaTypeLoaderBase.java:345)
at org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.
parse(Unknown Source)
at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:135)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:190)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:107)
at ReadDocFileFromJava.readMyDocument(ReadDocFileFromJava.java:24)
at ReadDocFileFromJava.main(ReadDocFileFromJava.java:15)