尝试使用HttpWebRequest获取身份验证cookie

时间:2012-08-01 15:15:56

标签: c# httpwebrequest screen-scraping webclient httpwebresponse

我必须从安全网站上刮取一张桌子,但我无法登录该页面并检索身份验证令牌和任何其他相关的Cookie。我在这里做错了吗?

public NameValueCollection LoginToDatrose()
{
    var loginUriBuilder = new UriBuilder();
    loginUriBuilder.Host = DatroseHostName;
    loginUriBuilder.Path = BuildURIPath(DatroseBasePath, LOGIN_PAGE);
    loginUriBuilder.Scheme = "https";

    var boundary = Guid.NewGuid().ToString();
    var postData = new NameValueCollection();
    postData.Add("LoginName", DatroseUserName);
    postData.Add("Password", DatrosePassword);

    var data = Encoding.ASCII.GetBytes(postData.ToQueryString(false));
    var request = WebRequest.Create(loginUriBuilder.Uri) as HttpWebRequest;
    request.Method = "POST";
    request.ContentType = "application/x-www-form-urlencoded";
    request.ContentLength = data.Length;
    using (var d = request.GetRequestStream())
    {
        d.Write(data, 0, data.Length);
    }

    var response = request.GetResponse() as HttpWebResponse;
    var responseCookies = new NameValueCollection();
    foreach (var nvp in response.Cookies.OfType<Cookie>())
    {
        responseCookies.Add(nvp.Name, nvp.Value);
    }

    //using (var responseData = response.GetResponseStream())
    //using (var responseReader = new StreamReader(responseData))
    //{
    //    var theResponse = responseReader.ReadToEnd();
    //    Debug.WriteLine(theResponse);
    //}

    return responseCookies;

}

我在返回对象中没有得到任何值。它不会失败。 theResponse的值(未注释掉时)似乎是登录页面的HTML。

非常感谢任何协助。

1 个答案:

答案 0 :(得分:10)

好的,此处的问题似乎与传递凭据后发生的302重定向有关。 HttpWebRequest将自动跟随302。

最终,我最终做了一些不同的事情。首先,我将WebClient类子类化如下:

public class CookiesAwareWebClient : WebClient
{
    private CookieContainer outboundCookies = new CookieContainer();
    private CookieCollection inboundCookies = new CookieCollection();

    public CookieContainer OutboundCookies
    {
        get
        {
            return outboundCookies;
        }
    }
    public CookieCollection InboundCookies
    {
        get
        { 
            return inboundCookies; 
        }
    }

    public bool IgnoreRedirects { get; set; }

    protected override WebRequest GetWebRequest(Uri address)
    {
        WebRequest request = base.GetWebRequest(address);
        if (request is HttpWebRequest)
        {
            (request as HttpWebRequest).CookieContainer = outboundCookies;
            (request as HttpWebRequest).AllowAutoRedirect = !IgnoreRedirects;
        }
        return request;
    }

    protected override WebResponse GetWebResponse(WebRequest request)
    {
        WebResponse response = base.GetWebResponse(request);
        if (response is HttpWebResponse)
        {
            inboundCookies = (response as HttpWebResponse).Cookies ?? inboundCookies;
        }
        return response;
    }
}

这允许我使用一个能识别cookie的WebClient类以及一个可以控制重定向的类。然后我重新编写了我的登录代码,如下所示:

public NameValueCollection LoginToDatrose()
{
    var loginUriBuilder = new UriBuilder();
    loginUriBuilder.Host = DatroseHostName;
    loginUriBuilder.Path = BuildURIPath(DatroseBasePath, LOGIN_PAGE);
    loginUriBuilder.Scheme = "https";

    var postData = new NameValueCollection();
    postData.Add("LoginName", DatroseUserName);
    postData.Add("Password", DatrosePassword);

    var responseCookies = new NameValueCollection();

    using (var client = new CookiesAwareWebClient())
    {
        client.IgnoreRedirects = true;
        var clientResponse = client.UploadValues(loginUriBuilder.Uri, "POST", postData);
        foreach (var nvp in client.InboundCookies.OfType<Cookie>())
        {
            responseCookies.Add(nvp.Name, nvp.Value);
        }
    }

    return responseCookies;
}

......一切都在游泳。