在Java中登录后获取内容

时间:2011-09-25 15:25:35

标签: java httpclient

我想登录网站(雅虎邮箱 - https://login.yahoo.com/config/login?.src=fpctx&.intl=us&.done=http%3A%2F%2Fwww.yahoo.com%2F

使用HttpClient并在登录后我想检索内容。 (JAVA)。我的代码出了什么问题?

public class TestHttpClient {

public static void main(String[] args) throws Exception {

    DefaultHttpClient httpclient = new DefaultHttpClient();

    HttpGet httpget = new HttpGet("http://www.yahoo.com/");

    HttpResponse response = httpclient.execute(httpget);
    HttpEntity entity = response.getEntity();

    System.out.println("Login form get: " + response.getStatusLine());
    if (entity != null) {
        entity.consumeContent();
    }
    System.out.println("Initial set of cookies:");
    List<Cookie> cookies = httpclient.getCookieStore().getCookies();
    if (cookies.isEmpty()) {
        System.out.println("None");
    } else {
        for (int i = 0; i < cookies.size(); i++) {
            System.out.println("- " + cookies.get(i).toString());
        }
    }

    HttpPost httpost = new HttpPost("https://login.yahoo.com/config/login_verify2?.intl=us&.src=ym");

    List <NameValuePair> nvps = new ArrayList <NameValuePair>();
    nvps.add(new BasicNameValuePair("IDToken1", "Yahoo! ID"));
    nvps.add(new BasicNameValuePair("IDToken2", "Password"));

    httpost.setEntity(new UrlEncodedFormEntity(nvps, HTTP.UTF_8));

    response = httpclient.execute(httpost);

    System.out.println("Response "+response.toString());
    entity = response.getEntity();

    System.out.println("Login form get: " + response.getStatusLine());
    if (entity != null) {

        InputStream is = entity.getContent();
        BufferedReader br = new BufferedReader(new InputStreamReader(is));
        String str ="";
        while ((str = br.readLine()) != null){
            System.out.println(""+str);
        }
    }

    System.out.println("Post logon cookies:");
    cookies = httpclient.getCookieStore().getCookies();
    if (cookies.isEmpty()) {
        System.out.println("None");
    } else {
        for (int i = 0; i < cookies.size(); i++) {
            System.out.println("- " + cookies.get(i).toString());
        }
    }
    httpclient.getConnectionManager().shutdown();        
  }
}

当我从HttpEntity打印输出时,它打印登录页面内容。在使用HttpClient登录后如何获取页面内容?

1 个答案:

答案 0 :(得分:2)

如果你看到雅虎登录源页面,你会发现你的请求中还没有发送许多其他参数。

<input type="hidden" name=".tries" value="1">
<input type="hidden" name=".src" value="fpctx">
<input type="hidden" name=".md5" value="">
<input type="hidden" name=".hash" value="">
<input type="hidden" name=".js" value="">
<input type="hidden" name=".last" value="">
<input type="hidden" name="promo" value="">
<input type="hidden" name=".intl" value="us">
<input type="hidden" name=".bypass" value="">
<input type="hidden" name=".partner" value="">
<input type="hidden" name=".u" value="a0bljsd77uima">
<input type="hidden" name=".v" value="0">
<input type="hidden" name=".challenge" value="sCm6Z8Bv1vy78LBlEd8dnFsmbit1">
<input type="hidden" name=".yplus" value="">
...

我想这就是为什么Yahoo了解登录失败并再次将您发送到登录页面的原因。该登录页面就是您所看到的响应。

许多网站都试图避免程序化登录(以避免机器人或其他安全问题),因此您可能很难做到正在尝试的事情。你可以:

  • 尽可能使用官方Yahoo公共API。
  • 尝试使用模拟用户浏览的其他Java库(例如HTTPUnitHtmlUnit,还有许多其他)并“伪造”用户,好像他正在浏览Yahoo页面。