我的问题是我正在尝试使用爬虫进入scopus,但它需要我的爬虫通过我的学校代理服务器进入该网站。我尝试过身份验证,但会继续以401
状态回复。
public void testConnection() throws ClientProtocolException, IOException {
DefaultHttpClient httpclient = new DefaultHttpClient();
List<String> authpref = new ArrayList<String>();
authpref.add(AuthPolicy.NTLM);
httpclient.getParams().setParameter(AuthPNames.TARGET_AUTH_PREF, authpref);
NTCredentials creds = new NTCredentials("username","password","ezlibproxy1.ntu.edu.sg","ntu.edu.sg");//this is correct
httpclient.getCredentialsProvider().setCredentials(AuthScope.ANY, creds);
HttpHost target = new HttpHost("ezlibproxy1.ntu.edu.sg", 443, "https");//this is correct
// Make sure the same context is used to execute logically related requests
HttpContext localContext = new BasicHttpContext();
// Execute a cheap method first. This will trigger NTLM authentication
HttpGet httpget = new HttpGet("http://www-scopus-com.ezlibproxy1.ntu.edu.sg/authid/detail.url?authorId=14831850700");
HttpResponse response = httpclient.execute(target, httpget, localContext);
HttpEntity entity = response.getEntity();
System.out.println(EntityUtils.toString(entity));
int statusCode = response.getStatusLine().getStatusCode();
System.out.println("Status Code:" + statusCode);
}
状态代码响应为401 (unauthorised)
。
有什么建议吗?