我想从以下网站获取一些数据:
该网站包含一些有关乒乓球的数据。只有登录才能在上一季不登录的情况下访问实际季节。对于实际的季节,我已经创建了一些代码来获取数据并且它工作正常。我正在使用HtmlAgilityPack中的HttpClient。代码如下所示:
HttpClient http = new HttpClient();
var response = await http.GetByteArrayAsync(website);
String source = Encoding.GetEncoding("utf-8").GetString(response, 0, response.Length - 1);
source = WebUtility.HtmlDecode(source);
HtmlDocument resultat = new HtmlDocument();
resultat.LoadHtml(source);
Do something to get the relevant data from resultat by scanning the DocumentNodes from resultat...
现在我想从需要登录的网站上获取数据。有没有人知道如何登录网站并获取数据?必须通过单击“Ergebnishistorie freischalten ...”然后输入用户名和密码来完成登录。
答案 0 :(得分:7)
执行网站登录的方法有很多种,具体取决于特定网站使用的身份验证方法(表单身份验证,基本身份验证,Windows身份验证等)。通常网站使用FormsAuthentication。
要使用HttpClient在标准FormsAuthentication网站上执行登录,您需要设置CookieContainer,因为将在Cookie上设置身份验证数据。
在您的具体示例中,登录表单对HTTPS中的任何页面进行POST,我使用https://wttv.click-tt.de/cgi-bin/WebObjects/nuLigaTTDE.woa/wa/teamPortrait?teamtable=1673669&pageState=rueckrunde&championship=SK+Bez.+BB+13%2F14&group=204559作为示例。这是使用HttpClient发出请求的代码:
var baseAddress = new Uri("https://wttv.click-tt.de/");
var cookieContainer = new CookieContainer();
using (var handler = new HttpClientHandler() { CookieContainer = cookieContainer })
using (var client = new HttpClient(handler) { BaseAddress = baseAddress })
{
//usually i make a standard request without authentication, eg: to the home page.
//by doing this request you store some initial cookie values, that might be used in the subsequent login request and checked by the server
var homePageResult = client.GetAsync("/");
homePageResult.Result.EnsureSuccessStatusCode();
var content = new FormUrlEncodedContent(new[]
{
//the name of the form values must be the name of <input /> tags of the login form, in this case the tag is <input type="text" name="username">
new KeyValuePair<string, string>("username", "username"),
new KeyValuePair<string, string>("password", "password"),
});
var loginResult = client.PostAsync("/cgi-bin/WebObjects/nuLigaTTDE.woa/wa/teamPortrait?teamtable=1673669&pageState=rueckrunde&championship=SK+Bez.+BB+13%2F14&group=204559", content).Result;
loginResult.EnsureSuccessStatusCode();
//make the subsequent web requests using the same HttpClient object
}
然而,许多网站使用一些javascript加载的表单值或甚至更多的一些验证码控件,显然这个解决方案将无法正常工作。这可以像使用WebBrowser控件那样完成(通过在表单字段上自动输入用户输入然后点击登录按钮,此链接有一个示例:https://social.msdn.microsoft.com/Forums/vstudio/en-US/0b77ca8c-48ce-4fa8-9367-c7491aa359b0/yahoo-login-via-systemnetsockets-namespace?forum=vbgeneral)。
作为一般规则检查登陆您所需网站的方式,请使用Fiddler:http://www.telerik.com/fiddler:当您点击网站上的登录按钮时,请观看Fiddler并找到登录请求(通常是第一次请求)在您单击“登录”按钮后,通常是POST请求。)
然后检查请求数据(选择请求并转到“检查器” - “TextView”选项卡)并尝试在代码上复制请求。
在左侧窗格中,Fiddler拦截了所有请求,右侧窗格中有请求和响应检查员(顶部有请求检查员,底部有响应检查员)
与旧WebRequest类相同的代码:http://rextester.com/LLP86817
var cookieContainer = new CookieContainer();
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("https://wttv.click-tt.de/");
request.CookieContainer = cookieContainer;
//set the user agent and accept header values, to simulate a real web browser
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
//SET AUTOMATIC DECOMPRESSION
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
Console.WriteLine("FIRST RESPONSE");
Console.WriteLine();
using (WebResponse response = request.GetResponse())
{
using (StreamReader sr = new StreamReader(response.GetResponseStream()))
{
Console.WriteLine(sr.ReadToEnd());
}
}
request = (HttpWebRequest)HttpWebRequest.Create("https://wttv.click-tt.de/cgi-bin/WebObjects/nuLigaTTDE.woa/wa/teamPortrait?teamtable=1673669&pageState=rueckrunde&championship=SK+Bez.+BB+13%2F14&group=204559");
//set the cookie container object
request.CookieContainer = cookieContainer;
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
//set method POST and content type application/x-www-form-urlencoded
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
//SET AUTOMATIC DECOMPRESSION
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
//insert your username and password
string data = string.Format("username={0}&password={1}", "username", "password");
byte[] bytes = System.Text.Encoding.UTF8.GetBytes(data);
request.ContentLength = bytes.Length;
using (Stream dataStream = request.GetRequestStream())
{
dataStream.Write(bytes, 0, bytes.Length);
dataStream.Close();
}
Console.WriteLine("LOGIN RESPONSE");
Console.WriteLine();
using (WebResponse response = request.GetResponse())
{
using (StreamReader sr = new StreamReader(response.GetResponseStream()))
{
Console.WriteLine(sr.ReadToEnd());
}
}
//request = (HttpWebRequest)HttpWebRequest.Create("INTERNAL PROTECTED PAGE ADDRESS");
//After a successful login, you must use the same cookie container for all request
//request.CookieContainer = cookieContainer;
//....