我有一些Java代码使用SAX确定xml文档的根级元素的命名空间。如果命名空间为“http://sbgn.org/libsbgn/pd/0.1”,则应返回版本1.如果命名空间为“http://sbgn.org/libsbgn/0.2”,则版本应为2。因此,所有代码都会读取第一个元素,并根据命名空间设置变量。这是代码:
private static class VersionHandler extends DefaultHandler
{
private int version = -1;
@Override
public void startElement (String uri, String localName, String qName, Attributes attributes) throws SAXException
{
if ("sbgn".equals (qName))
{
System.out.println (uri);
if ("http://sbgn.org/libsbgn/0.2".equals(uri))
{
version = 2;
}
else if ("http://sbgn.org/libsbgn/pd/0.1".equals(uri))
{
version = 1;
}
else
{
version = -1;
}
}
}
public int getVersion() { return version; }
};
public static int getVersion(File file) throws SAXException, FileNotFoundException, IOException
{
XMLReader xr;
xr = XMLReaderFactory.createXMLReader();
VersionHandler versionHandler = new VersionHandler();
xr.setContentHandler(versionHandler);
xr.setErrorHandler(versionHandler);
xr.parse(new InputSource(
InputStreamToReader.inputStreamToReader(
new FileInputStream (file))));
return versionHandler.getVersion();
}
这有效,但有两个问题:
java.net.UnknownHostException: www.w3.org at java.net.PlainSocketImpl.connect(Unknown Source) at java.net.SocksSocketImpl.connect(Unknown Source) at java.net.Socket.connect(Unknown Source) at java.net.Socket.connect(Unknown Source) at sun.net.NetworkClient.doConnect(Unknown Source) at sun.net.www.http.HttpClient.openServer(Unknown Source) at sun.net.www.http.HttpClient.openServer(Unknown Source) at sun.net.www.http.HttpClient.(Unknown Source) at sun.net.www.http.HttpClient.New(Unknown Source) at sun.net.www.http.HttpClient.New(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.connect(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source) at org.sbgn.SbgnVersionFinder.getVersion(SbgnVersionFinder.java:57)
所以我的问题是:
编辑:这是我尝试以这种方式解析的示例文档的链接:https://libsbgn.svn.sourceforge.net/svnroot/libsbgn/trunk/test-files/PD/adh.sbgn
Edit2:关于这个bug的解决方案的说明:事实上问题是因为正在解析错误的文档而不是目标文档而被触发,我正在解析实际上参考www.w3的XHMTML文档.ORG。当然解决方案是使用正确的文档。不过,我发现添加这一行很有用:
xr.setEntityResolver(null);
为了防止xerces在完全没必要时通过互联网。