我的代码需要在GZIPInputStream中下载一个大的xml文件(500MB)并处理它为每个对象执行一些操作。这些操作需要时间才能完成,我有很多要处理的对象。我正在使用commons http-client 3.1和stax。
public void download(String url) throws HttpException, IOException,
XMLStreamException, FactoryConfigurationError {
GetMethod getMethod = new GetMethod(url);
try {
httpClient.executeMethod(getMethod);
Header contentEncoding = getMethod.getResponseHeader("Content-Encoding");
if (contentEncoding != null) {
String acceptEncodingValue = contentEncoding.getValue();
if (acceptEncodingValue.indexOf("gzip") != -1) {
processStream(new GZIPInputStream(getMethod.getResponseBodyAsStream()));
return;
}
}
processStream(getMethod.getResponseBodyAsStream());
return;
} finally {
getMethod.releaseConnection();
}
}
protected void processStream(InputStream inputStream) throws XMLStreamException, FactoryConfigurationError {
XMLStreamReader xmlStreamReader = XMLInputFactory.newFactory().createXMLStreamReader(inputStream);
//parses xml with Stax
//executes some long operations for each object
}
当我运行代码时,它会工作,直到两三个小时后,我得到一个SocketException: Connection reset
。
看起来服务器已关闭连接,是否正确?有没有办法在服务器端没有任何变化的情况下避免此错误?如果没有,我该如何处理它以避免从一开始就重新运行我的应用程序?
com.ctc.wstx.exc.WstxIOException: Connection reset
at com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086)
.................
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:182)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at org.apache.commons.httpclient.AutoCloseInputStream.read(AutoCloseInputStream.java:108)
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:221)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:141)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:92)
at java.io.FilterInputStream.read(FilterInputStream.java:90)
at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:1034)
at com.ctc.wstx.sr.StreamScanner.getNextChar(StreamScanner.java:794)
at com.ctc.wstx.sr.BasicStreamReader.parseNormalizedAttrValue(BasicStreamReader.java:1900)
at com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3037)
at com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936)
at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
答案 0 :(得分:0)
一个建议是在本地缓存文件,然后再处理。
即。您的处理程序只是读取流并将其写入磁盘上的临时文件。然后它关闭流并处理临时文件中的数据。
这可能是一个很好的方法,因为即使你可以保持链接,一些网络中断的可能性,降低的QoS等等可能使检索文件不可靠。您可能还会阻止服务器在整个处理过程中更新它,这有点反社交。
答案 1 :(得分:0)
如果无法将xml复制到本地计算机,请尝试查看连接是否超时。也许xml的处理时间太长,连接被其中一个中间服务器重置