C#WebBrowser会话到WebRequest

时间:2016-05-12 22:12:45

标签: c# cookies httprequest session-cookies

我正在尝试编写程序来从Google抓取网址,当被要求提供验证码时,会打开一个表单,允许用户输入验证码并让程序继续运行。程序工作正常,直到验证码。表单将打开并允许用户键入验证码,并且webbrowser将加载下一页,但会话将不会转移到webrequest,从而导致打开webbrowser表单的循环,要求用户键入在验证码。我曾尝试将cookie从webbrowser复制到webrequest cookie容器,但无济于事。

            foreach (string cookie in f2.webForm.Document.Cookie.Split(';'))
            {
                string name = cookie.Split('=')[0];
                string value = cookie.Substring(name.Length + 1);
                string path = "/";
                string domain = "ipv4.google.com";
                //webRequest.CookieContainer.Add(new Cookie(name.Trim(), value.Trim(), path, domain));
                cookieJar.Add(new Cookie(name.Trim(), value.Trim(), path, domain));
            }

这是完整的代码。请记住它有点粗略写,所以不要判断:P

    CookieContainer cookieJar = new CookieContainer();
    for (int i = 0; i <= 30; i += 10)
    {
        string url = "https://www.google.com/search?newwindow=1&q=inurl:test.php" + "&start=" + i;
        HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
        webRequest.CookieContainer = cookieJar;
        Thread.Sleep(1000);
        try
        {
            webRequest.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246";
            //webRequest.CookieContainer = new CookieContainer();
            webRequest.ProtocolVersion = HttpVersion.Version11;
            webRequest.Method = "GET";
            webRequest.KeepAlive = false;
            webRequest.ContentType = "text/html";
            webRequest.Timeout = 20000;
            //webRequest.UseDefaultCredentials = true;
            Stream objStream = webRequest.GetResponse().GetResponseStream();
            StreamReader streamReader = new StreamReader(objStream);
            String sLine = "";
            List<string> lLines = new List<string>();
            List<string> lUrls = new List<string>();
            string[] findhttp;
            int endIndex = 0;
            Thread.Sleep(1000);
            HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();
            boxUrl.AppendText("test: " + webResponse.StatusCode + "\n");

            // Get Google's web search and store each line in "lUrls" List
            while (sLine != null)
            {
                boxDorks.AppendText(sLine);
                lLines.Add(sLine);
                sLine = streamReader.ReadLine();
            }

            // Lets loop through and get all the URLs
            foreach (string s in lLines)
            {
                // Find the index of href="http
                findhttp = s.Split(new string[] { "href=\"http" }, StringSplitOptions.None);

                // Parse URL
                foreach (string find in findhttp)
                {
                    if (s.IndexOf("href=\"http") > 0)
                    {
                        endIndex = find.IndexOf("\" onmousedown"); // Find position of quote

                        if (endIndex > 0 && find.IndexOf("webcache.googleusercontent.com") < 0 &&
                                            find.IndexOf("support.google.com") < 0 &&
                                            find.IndexOf("robots.txt") < 0 &&
                                            find.IndexOf("translate.google.com") < 0) // we don't want these!
                        {
                            lUrls.Add("http" + find.Substring(0, endIndex));
                        }
                    }
                }
            }

            // Output URLs
            foreach (string s in lUrls)
            {
                boxUrl.AppendText("test: " + s + "\n");
            }
        }
        catch (WebException we)
        {
            boxUrl.AppendText("exception: " + we);
            //using (var sr = new StreamReader(we.Response.GetResponseStream()))
            // {
            //var html = sr.ReadToEnd();
              //}
            // Open form to show google captcha
            Form2 f2 = new Form2(we.Response.ResponseUri.ToString());//workaround to get webform.Navigate to work properly
            f2.ShowDialog();

            // Copy cookies from webbrowser to webrequest cookies
            foreach (string cookie in f2.webForm.Document.Cookie.Split(';'))
            {
                string name = cookie.Split('=')[0];
                string value = cookie.Substring(name.Length + 1);
                string path = "/";
                string domain = "ipv4.google.com";
                //webRequest.CookieContainer.Add(new Cookie(name.Trim(), value.Trim(), path, domain));
                cookieJar.Add(new Cookie(name.Trim(), value.Trim(), path, domain));
            }

提前谢谢!

1 个答案:

答案 0 :(得分:0)

经过一番搜索,我找到了解决方案。事实证明,如果您尝试使用我发布的方式从Web浏览器获取cookie,它将不会返回仅HTTP的cookie。这是我发现的一种解决方法,归功于Yoni Couriel! https://ycouriel.blogspot.com/2010/07/webbrowser-and-httpwebrequest-cookies.html