我需要获取Amazon Web Service生成的页面的完整html。我的问题与为什么我无法从AWS服务器获取此html有关。
AWS服务提供通常由浏览器处理的初始URL(URL#1)。 URL#1返回状态302和重定向页面的URL(URL#2)。
当浏览器使用这些URL时,此方法工作正常。但是,当下面的Java代码运行时,URL#2的状态为404,运行时错误为: java.io.FileNotFoundException:“ URL#2”
代码如下:
URL url = null;
HttpsURLConnection conHTTPS = null;
int nResponseCode = -999;
try {
// Load url with URL #1.
url = new URL("URL #1");
conHTTPS = (HttpsURLConnection)url.openConnection(); // create a Connection object for this URL but we don't actually connect yet
conHTTPS.setInstanceFollowRedirects(false); // without this we get a Status = 404 in getResponseCode() below on this initial URL!
conHTTPS.setRequestProperty("User-Agent","Mozilla/5.0 ( compatible ) "); // try to make us "look like a browser"
conHTTPS.setRequestProperty("Accept","*/*");
conHTTPS.setRequestMethod("GET");
// Java 7 defaults to TLS 1.0 so must do this before getResponseCode() or will throw with javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake
SSLContext ssl = null;
try {
ssl = SSLContext.getInstance("TLSv1.2");
ssl.init(null,null,new SecureRandom());
}
catch (final NoSuchAlgorithmException | KeyManagementException e) {
e.printStackTrace();
}
conHTTPS.setSSLSocketFactory(ssl.getSocketFactory()); // get this connection setup for TLS 1.2
conHTTPS.connect(); // actually connect to URL #1 (should be done by getResponseCode() but we do it to be sure)
nResponseCode = conHTTPS.getResponseCode(); // nResponseCode is returned as 302 (a redirect)
}
catch (final MalformedURLException e1) {
e1.printStackTrace();
}
catch (final IOException e1) {
e1.printStackTrace();
}
// Use the response to URL #1 to get the redirection URL #2.
URL urlRedir = null;
HttpsURLConnection conHTTPSRedir = null;
try {
// The prior conHTTPS.connect(); returned the redir URL in the "Location" header so get it.
final URL urlBase = conHTTPS.getURL();
final String sLocation = conHTTPS.getHeaderField("Location");
urlRedir = new URL(urlBase,sLocation); // use these two parts to build URL #2
conHTTPSRedir = (HttpsURLConnection)urlRedir.openConnection();
conHTTPSRedir.setInstanceFollowRedirects(false);
try {
sslRedir = SSLContext.getInstance("TLSv1.2");
sslRedir.init(null,null,new SecureRandom());
}
catch (final NoSuchAlgorithmException | KeyManagementException e) {
e.printStackTrace();
}
conHTTPSRedir.setSSLSocketFactory(sslRedir.getSocketFactory());
conHTTPSRedir.setRequestProperty("User-Agent","Mozilla/5.0 ( compatible ) ");
conHTTPSRedir.setRequestProperty("Accept","*/*");
conHTTPSRedir.setRequestMethod("GET");
conHTTPSRedir.connect();
nResponseCode = conHTTPSRedir.getResponseCode(); // nResponseCode is returned as 404!
final InputStream inputStream = conHTTPSRedir.getInputStream(); // consistent with the 404, this throws java.io.FileNotFoundException
}
catch (final IOException e1) {
// Get the html of the error page the AWS server returns.
final InputStream is = conHTTPSRedir.getErrorStream();
final String contentEncoding = conHTTPSRedir.getContentEncoding() != null ? conHTTPSRedir.getContentEncoding() : "UTF-8";
final String sErrorPage = IOUtils.toString(is, contentEncoding); //Apache Commons IO
}
以下是上面catch块中sErrorPage包含的(唯一)信息部分:
The page you tried was not found.
You may have typed the address incorrectly or you may have used an outdated link.
如果我使用调试器并逐步执行代码,并在获取urlRedir之后立即停止并手动获取其值并将其粘贴到浏览器中,那么我们将看到该页面,因此该URL#2很好,并且可以在浏览器。但是,如果我让代码继续,我们将为相同的URL得到一个404 !!
任何人都可以解释这里出了什么问题吗?
谢谢。