这是我获取网站内容的代码行:
private string GetContent(string url) {
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
var content = String.Empty;
HttpStatusCode statusCode;
using (var response = request.GetResponse())
using (var stream = response.GetResponseStream())
{
var contentType = response.ContentType;
Encoding encoding = null;
if (contentType != null)
{
var match = Regex.Match(contentType, @"(?<=charset\=).*");
if (match.Success)
encoding = Encoding.GetEncoding(match.ToString());
}
encoding = encoding ?? Encoding.UTF8;
statusCode = ((HttpWebResponse)response).StatusCode;
using (var reader = new StreamReader(stream, encoding))
content = reader.ReadToEnd();
}
return content;
}
我尝试使用链接http://google.com运行这行代码。它已经完成了。但是当我使用链接http://batdongsan.com.vn/运行时。它没有工作和显示&#34;抱歉!出了点问题。&#34;。而且我不知道为什么会发生什么。我如何获得第二个链接的内容?
答案 0 :(得分:3)
看起来该网站正在检查User-Agent标头,因为默认情况下它没有设置,它会返回错误消息。我添加了我的浏览器发送的内容,并能够获取该链接的内容。只需添加设置UserAgent的行,如下所示:
// ...
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36";
var content = String.Empty;
HttpStatusCode statusCode;
// ...