groovy 32位Windows无法解析大型XML文件

时间:2012-11-21 23:57:42

标签: groovy xml-parsing

我在Windows 7上设置java_opts -Xmx512M; 该文件大约是15Mb并且在XMLParsing中失败 - 我正在从命令终端执行(它也在groovyConsole中超时) W 7上正确执行同一文件的短版本

BTW Unchanged XML代码可在24秒内在SunOS 64Bit上正确执行。

  • 你能告诉我在Windows 7上可以做些什么吗?

代码片段:

import groovy.util.XmlParser
import javax.xml.xpath.*
import groovy.time.*

inpXMLFile='c:/EnvFiles/CCC.xml'

entry=new File("$inpXMLFile")
assert   entry.exists()
println " ... file existence validated"

...

def node= new XmlParser().parse( new File( inpXMLFile ) ) // Line 23 

// .... the rest of the script

附上完整跟踪:

... file existence validated

    Caught: java.net.ConnectException: Connection timed out: connect
      java.net.ConnectException: Connection timed out: connect
        at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:629)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1291)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1258)
        at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:260)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1151)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1047)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:960)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:607)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:488)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:835)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1210)
        at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:568)
        at compCurrent.run(compCurrent.groovy:23)

1 个答案:

答案 0 :(得分:0)

看起来错误是访问DTD元素而与大文件大小无关。

java.net.ConnectException: Connection timed out: connect
com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity

您有两种选择: 1.您可以删除或注释掉XML文件中的DTD引用和实体,或者 2.禁用在XMLParser上解析DTD和外部实体。

def parser = new XmlParser()
parser.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false)
parser.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)