HttpClient请求浏览器

时间:2013-02-22 14:58:51

标签: c# windows-8 http-headers

当我通过HttpClient类调用网站www.livescore.com时,我总是收到错误“500”。 可能是来自HttpClients的服务器阻止请求。

1)还有其他方法可以从网页上获取HTML吗?

2)如何设置标题以获取html内容?

当我在浏览器中设置标题时,我总是得到stange编码的内容。

    http_client.DefaultRequestHeaders.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml");
    http_client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate");
    http_client.DefaultRequestHeaders.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0");
    http_client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Charset", "ISO-8859-1");

3)我如何解决这个问题?有什么建议吗?

我在C#和HttpClientClass中使用Windows 8 Metro Style App

4 个答案:

答案 0 :(得分:53)

这里你去 - 请注意你必须解压缩你得到的gzip编码结果as per mleroy

private static readonly HttpClient _HttpClient = new HttpClient();

private static async Task<string> GetResponse(string url)
{
    using (var request = new HttpRequestMessage(HttpMethod.Get, new Uri(url)))
    {
        request.Headers.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml");
        request.Headers.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate");
        request.Headers.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0");
        request.Headers.TryAddWithoutValidation("Accept-Charset", "ISO-8859-1");

        using (var response = await _HttpClient.SendAsync(request).ConfigureAwait(false))
        {
            response.EnsureSuccessStatusCode();
            using (var responseStream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false))
            using (var decompressedStream = new GZipStream(responseStream, CompressionMode.Decompress))
            using (var streamReader = new StreamReader(decompressedStream))
            {
                return await streamReader.ReadToEndAsync().ConfigureAwait(false);
            }
        }
    }
}

称之为:

var response = await GetResponse("http://www.livescore.com/").ConfigureAwait(false); // or var response = GetResponse("http://www.livescore.com/").Result;

答案 1 :(得分:21)

也可以尝试添加压缩支持:

var compressclient = new HttpClient(new HttpClientHandler() 
{ 
AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip 
}); 

这也会添加标题。

根据相同的线程支持现在在Windows应用商店框架中:http://social.msdn.microsoft.com/Forums/windowsapps/en-US/429bb65c-5f6b-42e0-840b-1f1ea3626a42/httpclient-data-compression-and-caching?prof=required

答案 2 :(得分:3)

需要注意的几点事项。

  1. 该网站要求您提供用户代理,否则会返回500 HTTP错误。

  2. 对livescore.com的GET请求以302回复livescore.us。您需要处理重定向或直接请求livescore.us

  3. 您需要解压缩gzip压缩的响应
  4. 此代码使用.NET 4 Client Profile,我会告诉您它是否适合Windows应用商店应用。

    var request = (HttpWebRequest)HttpWebRequest.Create("http://www.livescore.com");
    request.AllowAutoRedirect = true;
    request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17";
    
    string content;
    
    using (var response = (HttpWebResponse)request.GetResponse())
    using (var decompressedStream = new GZipStream(response.GetResponseStream(), CompressionMode.Decompress))
    using (var streamReader = new StreamReader(decompressedStream))
    {
        content = streamReader.ReadToEnd();
    }
    

答案 3 :(得分:0)

我认为你可以非常肯定他们已经做了一切来阻止开发人员进行屏幕抓取。

如果我使用此代码尝试使用标准C#项目:

  var request = WebRequest.Create("http://www.livescore.com ");
  var response = request.GetResponse();

我收到了这个回复:

The remote server returned an error: (403) Forbidden.