HttpWebRequest - 启用假Javascript

时间:2013-02-24 15:37:38

标签: c# httpwebrequest web-scraping

我尝试使用以下方式在c#中下载网页:

var responseData = "";
var strUrl = this.QuerySelector(item, "a[class='url']").Attributes["href"].Value;

request = (HttpWebRequest)WebRequest.Create(strUrl);
request.Method = "GET";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = 0;
request.CookieContainer = cookies;
request.Timeout = System.Threading.Timeout.Infinite;
request.UserAgent = this.RefreshUserAgent();
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.Headers.Add("Accept-Encoding", "gzip,deflate,sdch");
request.KeepAlive = true;
request.AllowAutoRedirect = false;
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

response = (HttpWebResponse)request.GetResponse();
response.Cookies = request.CookieContainer.GetCookies(request.RequestUri);
var encoding = new System.Text.UTF8Encoding();
var responseReader = new StreamReader(response.GetResponseStream(), encoding, true);

responseData = responseReader.ReadToEnd();
response.Close();
responseReader.Close();

但该网站不断给我相同的代码,让JavaScript继续下去。 我已经用Fiddler检查了数据 - 它只是再次导航到自己,但我无法通过这条消息:

Before you can move on - please enable JavaScript.

该网站是Manta.com,这是我的示例页面。 任何想法..

http://www.manta.com/c/mrsywyl/leeds-automotive

2 个答案:

答案 0 :(得分:2)

这是对的。 HttpWebRequest只是执行HTTP请求。它不支持JavaScript。如果您想要这种行为,请尝试使用WebBrowser控件或类似Awesomium。

答案 1 :(得分:0)

您可以尝试伪造UserAgent请求标头:

request.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0";