我从编程Java开始,不想创建一个简单的“后端爬虫”。为此,我需要一个Post的登录功能,这不是问题,但我如何能够保存饼干战争,并在下一个请求时脚本不需要再次登录?
你能举个例子吗?我无法在互联网上找到解决方案。也许你可以解释一下我如何使用第一页的Cookie来完成下一个请求? :)
希望你的回答。
抱歉我的英语不好。
这是我的第一个Logintest:P
import java.util.ArrayList;
import java.util.List;
import org.apache.http.HttpEntity;
import org.apache.http.NameValuePair;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.util.EntityUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class Login {
public static String loginAndGetHTML() throws Exception {
CloseableHttpClient httpclient = HttpClients.createDefault();
String html;
HttpPost HttpPost = new HttpPost("http://www.google.com");
List <NameValuePair> nvps = new ArrayList <NameValuePair>();
nvps.add(new BasicNameValuePair("username", "admin"));
nvps.add(new BasicNameValuePair("password", "1234"));
HttpPost.setEntity(new UrlEncodedFormEntity(nvps));
HttpPost.addHeader("Referer", "http://tutorials.amazingcode.de/login/index.php");
HttpPost.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0");
CloseableHttpResponse response = httpclient.execute(HttpPost);
try {
HttpEntity entity = response.getEntity();
html = EntityUtils.toString(entity);
EntityUtils.consume(entity);
} finally {
response.close();
}
if(html.contains("Falsche Nutzerdaten")) {
throw new Exception("Login fehlgeschlagen");
}
return html;
}
public static String parseHTML(String html) throws Exception {
Document doc = Jsoup.parse(html);
String zahl = doc.getElementById("zahl").text();
return zahl;
}
}
答案 0 :(得分:0)
您需要从响应中检索Set-Cookie标头,存储其值, 并使用Cookie标头和该值询问以下请求。
请参阅此帖子,了解更多关于Cookie的解释器How are cookies passed in the HTTP protocol?