使用dom4j处理压缩的XML文档

时间:2013-01-15 20:03:35

标签: java xml zip kml dom4j

具体来说,我使用dom4j读取KML文档并解析XML中的一些数据。当我将字符串形式的URL传递给读者时,它非常简单并处理文件系统URL和Web URL:

SAXReader reader = new SAXReader();
Document document = reader.read(url);

问题是,有时我的代码需要处理KMZ文档,这些文档基本上只是压缩的XML(KML)文档。不幸的是,使用SAXReader没有方便的方法来解决这个问题。我找到各种时髦的解决方案来确定任何给定的文件是否是ZIP文件,但是我的代码很快就变得烦躁和讨厌 - 读取流,构建文件,检查开头的“魔术”十六进制字节,提取等

有没有一些快速而干净的方法来处理这个问题?连接到任何URL并在压缩后提取内容的更简单方法,否则只需抓取XML?

1 个答案:

答案 0 :(得分:0)

嗯,看起来KMZDOMLoader似乎不能处理网络上的kmz文件。 kmz可能是动态加载的,所以它并不总是a)文件引用或b)特别是.kmz扩展 - 它必须由内容类型决定。

我最终做的是构建一个URL对象,然后获取协议。我有单独的逻辑来处理Web上的本地文件或文档。然后在每个逻辑块内部,我必须确定它是否被压缩。 SAXReader read()方法接受输入流,所以我发现我可以使用ZipInputStream来获取kmzs。

这是我最终得到的代码:

private static final long ZIP_MAGIC_NUMBERS = 0x504B0304;
private static final String KMZ_CONTENT_TYPE = "application/vnd.google-earth.kmz";

private Document getDocument(String urlString) throws IOException, DocumentException, URISyntaxException {
        InputStream inputStream = null;
        URL url = new URL(urlString);
        String protocol = url.getProtocol();

        /*
         * Figure out how to get the XML from the URL -- there are 4 possibilities:
         * 
         * 1)  a KML (uncompressed) doc on the filesystem
         * 2)  a KMZ (compressed) doc on the filesystem
         * 3)  a KML (uncompressed) doc on the web
         * 4)  a KMZ (compressed) doc on the web
         */
        if (protocol.equalsIgnoreCase("file")) {
            // the provided input URL points to a file on a file system
            File file = new File(url.toURI());
            RandomAccessFile raf = new RandomAccessFile(file, "r");
            long n = raf.readInt();
            raf.close();

            if (n == KmlMetadataExtractorAdaptor.ZIP_MAGIC_NUMBERS) {
                // the file is a KMZ file
                inputStream = new ZipInputStream(new FileInputStream(file));
                ((ZipInputStream) inputStream).getNextEntry();
            } else {
                // the file is a KML file
                inputStream = new FileInputStream(file);
            }

        } else if (protocol.equalsIgnoreCase("http") || protocol.equalsIgnoreCase("https")) {
            // the provided input URL points to a web location
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.connect();

            String contentType = connection.getContentType();

            if (contentType.contains(KmlMetadataExtractorAdaptor.KMZ_CONTENT_TYPE)) {
                // the target resource is KMZ
                inputStream = new ZipInputStream(connection.getInputStream());
                ((ZipInputStream) inputStream).getNextEntry();
            } else {
                // the target resource is KML
                inputStream = connection.getInputStream();
            }

        }

        Document document = new SAXReader().read(inputStream);
        inputStream.close();

        return document;
    }