我正在尝试将源代码下载到Intranet上的页面。我可以在所有浏览器上访问该页面,而无需明确登录。
当我尝试使用下面的代码获取页面内容时,它会失败并显示以下错误代码:
public scrape() throws IOException{
String httpsURL = "https://myurl.aspx";
URL myurl = new URL(httpsURL);
HttpsURLConnection con = (HttpsURLConnection)myurl.openConnection();
InputStream ins = con.getInputStream(); //breaks here
InputStreamReader isr = new InputStreamReader(ins);
BufferedReader in = new BufferedReader(isr);
String inputLine;
while ((inputLine = in.readLine()) != null)
{
System.out.println(inputLine);
}
in.close();
}
错误:线程“main”中的异常java.io.IOException:服务器返回HTTP响应代码:500为URL:https://myurl.aspx
它特意在线上打破 - > InputStream ins = con.getInputStream();
我不确定如何纠正这个问题?
答案 0 :(得分:1)
首先要做的是,正如他/她的评论中的nsfyn55,使用浏览器检查标题。有些网站在返回响应之前检查User-Agent HTTP Header。要做的第二件事是,在使用HTTPS时,您需要正确初始化安全层。检查此课程:
public class SSLConfiguration {
private static boolean isSslInitialized = false;
private static final String PROTOCOL = "SSL";
public static boolean ACCEPT_ALL_CERTS = true;
public static void initializeSSLConnection() {
if (!isSslInitialized) {
if (ACCEPT_ALL_CERTS) {
initInsecure();
} else {
initSsl();
}
}
}
private static void initInsecure() {
TrustManager[] trustAllCerts = new TrustManager[]{
new X509TrustManager() {
@Override
public java.security.cert.X509Certificate[] getAcceptedIssuers() {
return null;
}
@Override
public void checkClientTrusted(
java.security.cert.X509Certificate[] certs, String authType) {
}
@Override
public void checkServerTrusted(
java.security.cert.X509Certificate[] certs, String authType) {
}
}
};
// Install the all-trusting trust manager
try {
SSLContext sc = SSLContext.getInstance(PROTOCOL);
sc.init(null, trustAllCerts, new java.security.SecureRandom());
HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());
} catch (Exception e) {
}
HttpsURLConnection.setDefaultHostnameVerifier(
new HostnameVerifier() {
@Override
public boolean verify(String string, SSLSession ssls) {
return true;
}
});
isSslInitialized = true;
}
private static void initSsl() {
SSLContext sc = null;
try {
sc = SSLContext.getInstance(PROTOCOL);
} catch (NoSuchAlgorithmException ex) {
throw new RuntimeException(ex);
}
try {
sc.init(null, null, new SecureRandom());
} catch (KeyManagementException ex) {
throw new RuntimeException(ex);
}
HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());
HostnameVerifier hv = new HostnameVerifier() {
@Override
public boolean verify(String urlHostName, SSLSession session) {
/* This is to avoid spoofing */
return (urlHostName.equals(session.getPeerHost()));
}
};
HttpsURLConnection.setDefaultHostnameVerifier(hv);
isSslInitialized = true;
}
}
连接很可能会失败 - 特别是如果网站没有正确的证书。在您的代码中,在类的构造函数中,插入以下代码:
SSLConfiguration.initializeSSLConnection();
还需要考虑一些事项 - 在openConnection
之后,建议您添加以下内容:
con.setRequestMethod(METHOD);
con.setDoInput(true);
con.setDoOutput(true);
con.setUseCaches(false);
但我倾向于相信,因为您从远程服务器获得响应,这更像是指定正确的标头,特别是User-Agent
和Accept
。如果上述方法无法帮助您解决问题,请打印出错误的堆栈跟踪并读取错误流(来自远程)以获取更有意义的错误消息。如果您使用Firefox,Live HTTP Headers是一个非常方便的解决方案。在处理HTTP请求时,cURL也是最强大的命令行工具。