确认内容类型后,如何将URL保存到文件?

时间:2012-11-21 13:43:02

标签: java url

如果文件是特定内容类型,我正在尝试从URL下载文件。 URL可以提供html或pdf页面,我只想保存pdf文件。我这样做的尝试如下:

HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("HEAD");
connection.connect();
String contentType = connection.getContentType();

if (contentType.equals("application/pdf")) {
      org.apache.commons.io.FileUtils.copyURLToFile(url, file);
}

正在正确提取contentType,但调用copyURLToFile(url,file);会导致以下异常:

java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at com.sun.net.ssl.internal.ssl.InputRecord.readFully(Unknown Source)
at com.sun.net.ssl.internal.ssl.InputRecord.read(Unknown Source)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(Unknown Source)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(Unknown Source)
at com.sun.net.ssl.internal.ssl.AppInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at java.io.FilterInputStream.read(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(Unknown Source)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1025)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:999)
at org.apache.commons.io.FileUtils.copyURLToFile(FileUtils.java:848)

如果我删除了用于获取contentType的代码行,并且只是调用copyURLToFile(url,file),则会下载并成功保存该文件。我是否以某种方式错误处理我的HttpURLConnection导致我的连接被重置?

我还注意到,如果我在if(contentType.equals("application/pdf")行设置断点并等待几秒钟,那么对copyURLToFile的调用会成功,而不会重置连接。我是否介绍了总是失败的某种竞争条件?

2 个答案:

答案 0 :(得分:2)

为什么不在阅读HEAD后尝试关闭连接?

   HttpURLConnection connection = (HttpURLConnection) url.openConnection();
   connection.setRequestMethod("HEAD");
   connection.connect();
   String contentType = connection.getContentType();
   connection.close();

然后,FileUtils应该打开一个新连接,您的问题可能会得到解决。

答案 1 :(得分:2)

您应该使用开放连接来读取数据:

org.apache.commons.io.IOUtils.copy(connection.getInputStream(), new FileOutputStream(file));

无需打开另一个连接,服务器可能会重置连接吗?

编辑:没有请求方法设置,但使用GET这对我有用:

public static void main(String args[]) throws IOException{
    URL url = new URL("http://www.google.com");
    HttpURLConnection connection = (HttpURLConnection) url.openConnection();
    connection.setRequestMethod("GET");
    String contentType = connection.getContentType();
    System.out.println("content-type: " + contentType);
    IOUtils.copy(connection.getInputStream(), new FileOutputStream("/temp/test.html"));
}

编辑:或者这样,如果你想先用HEAD请求检查标题:

URL url = new URL("http://www.google.com");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("HEAD");
String contentType = connection.getContentType();
System.out.println("content-type: " + contentType);
connection.disconnect();
connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
IOUtils.copy(connection.getInputStream(), new FileOutputStream("/temp/test.html"));
connection.disconnect();