如何在.net中使用Http Request在一条消息中进行POST?

时间:2013-11-06 17:15:49

标签: .net web-crawler login-automation

我正在尝试抓取网络,但无法通过.net HttpRequestHttpResponse类访问登录。使用网络监视器,似乎一个关键的区别是来自浏览器的登录包含POST消息中的有效负载,而HttpRequest在单独的消息中发送有效负载,获得301响应。有没有办法让它使用单个消息?或者还有其他我想念的东西?我已将此代码用于另一个有效的网站:

// Set GET to logon site.
SiteRequest = (HttpWebRequest)WebRequest.Create(logonUrl);

SiteRequest.Method = "GET";
SiteRequest.AllowAutoRedirect = AllowRedirect;
SiteRequest.CookieContainer = SiteCookieContainer;
SiteRequest.Referer = logonUrl;

SiteResponse = (HttpWebResponse)SiteRequest.GetResponse();
mainStream = SiteResponse.GetResponseStream();
ReadAndIgnoreAllStreamBytes(mainStream);
mainStream.Close();

// Send POST to logon site.
SiteRequest = (HttpWebRequest)WebRequest.Create(postUrl);
SiteRequest.Method = "POST";
SiteRequest.AllowAutoRedirect = AllowRedirect;
SiteRequest.ContentType = "application/x-www-form-urlencoded";
SiteRequest.CookieContainer = SiteCookieContainer;
SiteRequest.CookieContainer.Add(SiteResponse.Cookies);
SiteRequest.Referer = postUrl;
SiteRequest.Timeout = TimeoutMsec;

buffer = Encoding.UTF8.GetBytes(logonPostData);
SiteRequest.ContentLength = buffer.Length;

postStream = SiteRequest.GetRequestStream();
postStream.Write(buffer, 0, buffer.Length);
postStream.Flush();
postStream.Close();

SiteResponse = (HttpWebResponse)SiteRequest.GetResponse();

在HtmlAgilityPack中使用HtmlWeb类有同样的问题。

感谢。

更新

原来我使用的是地址的“www.example.com”形式,而不是“example.com”,因此重定向。但是我找到了一个“404”页面未找到错误的正确地址。

以下是浏览器发送帖子的内容:

- Http: Request, POST /accounts/signin 
    Command: POST
  + URI: /accounts/signin
    ProtocolVersion: HTTP/1.1
    Accept:  text/html, application/xhtml+xml, */*
    Referer:  http://***.com/accounts/signin
    Accept-Language:  en-US,en;q=0.8,zh-Hans-CN;q=0.7,zh-Hans;q=0.5,zh-Hant-TW;q=0.3,zh-Hant;q=0.2
    UserAgent:  Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0; Touch)
  + ContentType:  application/x-www-form-urlencoded
    Accept-Encoding:  gzip, deflate
    Host:  example.com
    ContentLength:  67
    DNT:  1
    Connection:  Keep-Alive
    Cache-Control:  no-cache
  - Cookie:  PHPSESSID=169***efe; lang=en_US; cart=eyJ***wfQ%3D%3D; cartitems=W10%3D; __utma=***; __utmb=***; __utmc=**; __utmz=**
      PHPSESSID: 169***efe
      lang: en_US
      cart: eyJ***wfQ%3D%3D
      cartitems: W10%3D
      __utma: ***
      __utmb: ***
      __utmc: ***
      __utmz: ***

    HeaderEnd: CRLF
  - payload: HttpContentType =  application/x-www-form-urlencoded
     url: 
     email: ***
     password: ***

这是我发送的内容:

(POST:)

- Http: Request, POST /accounts/signin 
    Command: POST
  + URI: /accounts/signin
    ProtocolVersion: HTTP/1.1
  + ContentType:  application/x-www-form-urlencoded
    Accept:  text/html, application/xhtml+xml, */*
    Accept-Language:  en-US,en;q=0.8,zh-Hans-CN;q=0.7,zh-Hans;q=0.5,zh-Hant-TW;q=0.3,zh-Hant;q=0.2
    Accept-Encoding:  gzip, deflate
    DNT:  1
    Cache-Control:  no-cache
    Referer:  http://***.com/accounts/signin
    Host:  chinesepod.com
  - Cookie:  lang=en_US; cart=eyJ***jowfQ%3D%3D; cartitems=W10%3D; PHPSESSID=944***3e7
      lang: en_US
      cart: eyJ***wfQ%3D%3D
      cartitems: W10%3D
      PHPSESSID: 944***3e7

    ContentLength:  61
    HeaderEnd: CRLF

(单独的有效载荷:)

- Http: HTTP Payload, URL: /accounts/signin 
  - payload: HttpContentType =  application/x-www-form-urlencoded
     url: 
     email: ***
     password: ***

浏览器版本有这些__utXX cookie,我假设浏览器添加了某种标记,对吧?否则,假设cookie排序无关紧要,关键区别在于有效载荷是单独发送的。看到别的什么事吗?

感谢。

-John

0 个答案:

没有答案