如何使用http请求和响应记住Web爬网的登录详细信息

时间:2013-02-15 09:33:16

标签: c# .net httpwebrequest web-scraping web-crawler

我想废弃一个网页,登录时显示会员的电子邮件,使用两个Webbrowser控件,我可以使用一个用于登录,其他网页浏览器用于我所需的页面。

因为我有1000个网址,我想使用http请求和响应,并使用regax表达式来获取所需的输出。

无论如何,http请求会记住登录并显示所有成员电子邮件吗?

1 个答案:

答案 0 :(得分:0)

这很简单,使用WebBrowser控件登录并在完成后将cookie信息保存在这样的容器中:

           string[] array = webBrowser.Document.Cookie.Split(new char[]
                        {
                            ';'
                        });
                        for (int i = 0; i < array.Length; i++)
                        {
                            string cookie = array[i];
                            string name = cookie.Split(new char[]
                            {
                                '='
                            })[0];
                            string value = cookie.Substring(name.Length + 1);
                            string path = "/";
                            string domain = "abc.com";
                            yummycookies.Add(new Cookie(name.Trim(), value.Trim(), path, domain));
                        }

现在你有一个容器内的cookie,现在使用HttpWebRequest和这个cookie容器,因为你已经登录,因为你有登录cookie。

 public  string getHtml(string url)
    {
        string responseData = "";
        try
        {
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
            request.Accept = "*/*";
            request.AllowAutoRedirect = true;
            request.UserAgent = "http_requester/0.1";
            request.Timeout = 60000;
            request.Method = "GET";
            request.CookieContainer=yummycookies;
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();
            if (response.StatusCode == HttpStatusCode.OK)
            {
                Stream responseStream = response.GetResponseStream();
                StreamReader myStreamReader = new StreamReader(responseStream);
                responseData = myStreamReader.ReadToEnd();
            }
            response.Close();
        }
        catch (Exception e)
        {
            responseData = "An error occurred: " + e.Message;
        }
        return responseData;
    }