Java Apache HTTP Client - 为下一个请求保存Cookie

时间:2016-12-17 21:17:25

标签: java cookies web-crawler

我从编程Java开始,不想创建一个简单的“后端爬虫”。为此,我需要一个Post的登录功能,这不是问题,但我如何能够保存饼干战争,并在下一个请求时脚本不需要再次登录?

你能举个例子吗?我无法在互联网上找到解决方案。

也许你可以解释一下我如何使用第一页的Cookie来完成下一个请求? :)

希望你的回答。

抱歉我的英语不好。

这是我的第一个Logintest:P

import java.util.ArrayList;
import java.util.List;
import org.apache.http.HttpEntity;
import org.apache.http.NameValuePair;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.util.EntityUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class Login {

    public static String loginAndGetHTML() throws Exception {
        CloseableHttpClient httpclient = HttpClients.createDefault();
        String html;

        HttpPost HttpPost = new HttpPost("http://www.google.com");
        List <NameValuePair> nvps = new ArrayList <NameValuePair>();
        nvps.add(new BasicNameValuePair("username", "admin"));
        nvps.add(new BasicNameValuePair("password", "1234"));
        HttpPost.setEntity(new UrlEncodedFormEntity(nvps));
        HttpPost.addHeader("Referer", "http://tutorials.amazingcode.de/login/index.php");
        HttpPost.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0");
        CloseableHttpResponse response = httpclient.execute(HttpPost);

        try {
            HttpEntity entity = response.getEntity();
            html = EntityUtils.toString(entity);
            EntityUtils.consume(entity);
        } finally {
            response.close();
        }

        if(html.contains("Falsche Nutzerdaten")) {
            throw new Exception("Login fehlgeschlagen");
        }



        return html;
    }

    public static String parseHTML(String html) throws Exception {
        Document doc = Jsoup.parse(html);
        String zahl = doc.getElementById("zahl").text();
        return zahl;
    }

}

1 个答案:

答案 0 :(得分:0)

您需要从响应中检索Set-Cookie标头,存储其值, 并使用Cookie标头和该值询问以下请求。

请参阅此帖子,了解更多关于Cookie的解释器How are cookies passed in the HTTP protocol?