流式传输xml时连接重置

时间:2011-12-15 14:05:00

标签: java stax socketexception connection-reset apache-commons-httpclient

我的代码需要在GZIPInputStream中下载一个大的xml文件(500MB)并处理它为每个对象执行一些操作。这些操作需要时间才能完成,我有很多要处理的对象。我正在使用commons http-client 3.1和stax。

public void download(String url) throws HttpException, IOException, 
                XMLStreamException, FactoryConfigurationError {

        GetMethod getMethod = new GetMethod(url);
        try {
            httpClient.executeMethod(getMethod);    
            Header contentEncoding = getMethod.getResponseHeader("Content-Encoding");
            if (contentEncoding != null) {
                String acceptEncodingValue = contentEncoding.getValue();
                if (acceptEncodingValue.indexOf("gzip") != -1) {
                    processStream(new GZIPInputStream(getMethod.getResponseBodyAsStream()));
                    return;
                }
            }

            processStream(getMethod.getResponseBodyAsStream());
            return;           
        } finally {
            getMethod.releaseConnection();
        }
    }

    protected void processStream(InputStream inputStream) throws XMLStreamException, FactoryConfigurationError {
        XMLStreamReader xmlStreamReader = XMLInputFactory.newFactory().createXMLStreamReader(inputStream);
        //parses xml with Stax           
        //executes some long operations for each object
    }

当我运行代码时,它会工作,直到两三个小时后,我得到一个SocketException: Connection reset。 看起来服务器已关闭连接,是否正确?有没有办法在服务器端没有任何变化的情况下避免此错误?如果没有,我该如何处理它以避免从一开始就重新运行我的应用程序?

com.ctc.wstx.exc.WstxIOException: Connection reset
    at com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708)
    at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086)
    .................
Caused by: java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(SocketInputStream.java:168)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
    at org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:182)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at org.apache.commons.httpclient.AutoCloseInputStream.read(AutoCloseInputStream.java:108)
    at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:221)
    at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:141)
    at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:92)
    at java.io.FilterInputStream.read(FilterInputStream.java:90)
    at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
    at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
    at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
    at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
    at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
    at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:1034)
    at com.ctc.wstx.sr.StreamScanner.getNextChar(StreamScanner.java:794)
    at com.ctc.wstx.sr.BasicStreamReader.parseNormalizedAttrValue(BasicStreamReader.java:1900)
    at com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3037)
    at com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936)
    at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)
    at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)

2 个答案:

答案 0 :(得分:0)

一个建议是在本地缓存文件,然后再处理。

即。您的处理程序只是读取流并将其写入磁盘上的临时文件。然后它关闭流并处理临时文件中的数据。

这可能是一个很好的方法,因为即使你可以保持链接,一些网络中断的可能性,降低的QoS等等可能使检索文件不可靠。您可能还会阻止服务器在整个处理过程中更新它,这有点反社交。

答案 1 :(得分:0)

如果无法将xml复制到本地计算机,请尝试查看连接是否超时。也许xml的处理时间太长,连接被其中一个中间服务器重置