在C#中使用asp.net表单登录屏幕抓取网站?

时间:2009-05-23 07:04:47

标签: c# screen-scraping

是否可以为受表单登录保护的网站编写屏幕抓取工具。当然,我可以访问该网站,但我不知道如何登录该网站并将我的凭据保存在C#中。

此外,C#中任何关于屏幕分析器的好例子都会非常受欢迎。

这已经完成了吗?

2 个答案:

答案 0 :(得分:6)

这很简单。您需要自定义登录(HttpPost)方法。

你可以想出这样的东西(以这种方式,你将在登录后获得所有需要的cookie,你只需将它们传递给下一个HttpWebRequest):

public static HttpWebResponse HttpPost(String url, String referer, String userAgent, ref CookieCollection cookies, String postData, out WebHeaderCollection headers, WebProxy proxy)
    {
        try
        {
            HttpWebRequest http = WebRequest.Create(url) as HttpWebRequest;
            http.Proxy = proxy;
            http.AllowAutoRedirect = true;
            http.Method = "POST";
            http.ContentType = "application/x-www-form-urlencoded";
            http.UserAgent = userAgent;
            http.CookieContainer = new CookieContainer();
            http.CookieContainer.Add(cookies);
            http.Referer = referer;
            byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(postData);
            http.ContentLength = dataBytes.Length;
            using (Stream postStream = http.GetRequestStream())
            {
                postStream.Write(dataBytes, 0, dataBytes.Length);
            }
            HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;
            headers = http.Headers;
            cookies.Add(httpResponse.Cookies);

            return httpResponse;
        }
        catch { }
        headers = null;

        return null;
    }

答案 1 :(得分:4)

当然,这已经完成了。我做了几次。这(通常)称为Screen-scraping或Web Scraping。

您应该查看this question(并浏览标记“screen-scraping”下的问题。请注意,Scraping不仅涉及从网络资源中提取数据。还涉及提交数据到在线表单,以便在提交输入(例如登录表单)时模仿用户的操作。