无需Web调用的SAX / StAX解析器

时间:2018-11-26 15:42:14

标签: java xml saxparser

我正在使用Java附带的SAX解析器,并且发现一些我不理解/讨厌的东西。

我正在使用SAX解析器来运行数百个XML文档,而这些文档实际上只需要解析其中的几个即可。如果文档中的第一个元素没有正确的命名空间,我将终止并继续下一个文档。

但是文档需要200到600毫秒才能初始化,甚至还没有开始阅读。我不知道为什么在开发之前不小心关闭了互联网并得到了很多这种例外的原因:

java.net.UnknownHostException: hibernate.sourceforge.net
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at java.net.Socket.connect(Socket.java:538)
    at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
    at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
    at sun.net.www.http.HttpClient.New(HttpClient.java:339)
    at sun.net.www.http.HttpClient.New(HttpClient.java:357)
    at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
    at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1564)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:647)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1304)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1270)
    at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:264)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1161)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1045)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:959)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:842)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:771)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:327)
    at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
    at dk.magenta.alfresco.GenerateJavaMojo$Finder.find(GenerateJavaMojo.java:188)
    at dk.magenta.alfresco.GenerateJavaMojo$Finder.visitFile(GenerateJavaMojo.java:215)
    at dk.magenta.alfresco.GenerateJavaMojo$Finder.visitFile(GenerateJavaMojo.java:160)
    at java.nio.file.Files.walkFileTree(Files.java:2670)
    at java.nio.file.Files.walkFileTree(Files.java:2742)
    at dk.magenta.alfresco.GenerateJavaMojo.execute(GenerateJavaMojo.java:110)
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320)
    at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
    at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
    at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
    at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352)

问题是双重的:为什么sax解析器会花费所有时间连接到Web URL,我可以将其关闭还是有一种可以关闭它的实现?

更新: 我尝试添加这样的EntityResolver。

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
factory.setValidating(false);
SAXParser saxParser = factory.newSAXParser();
saxParser.getXMLReader().setEntityResolver(new DummyEntityResolver());

....

public class DummyEntityResolver implements EntityResolver {

        public InputSource resolveEntity(String publicID, String systemID)
                throws SAXException {

            return new InputSource(new StringReader(""));
        }
    }

它无效。通话平均仍为400毫秒,当Internet不可用时几乎是即时的。

2 个答案:

答案 0 :(得分:1)

文档中可能包含对该位置保存的外部DTD的引用。如果外部DTD包含需要扩展的实体,那么没有它可能无法解析,但是如果您知道不是这样,或者如果您在本地具有DTD的副本,则可以在EntityResolver上设置XML解析器,用于将引用重定向到本地文件。如果您不想编写自己的EntityResolver,也可以使用Oasis目录将引用重定向到本地副本,但是除非您不担心编写任何设置更复杂的Java。

答案 1 :(得分:0)

解析器肯定会花时间连接到互联网以下载外部DTD。由于这些XML来自已经验证过的标准公共库,因此我建议您通过以下方式禁用验证:

SAXParserFactory saxParserFactory=SAXParserFactory.getInstance();
saxParserFactory.setValidating(false);
SAXParser saxParser=saxParserFactory.newSAXParser();