我正在尝试使用以下代码阅读c#中https网址的html源代码:
WebClient webClient = new WebClient();
string htmlString = w.DownloadString("https://www.targetUrl.com");
这对我来说不起作用,因为我得到了编码的html字符串。我尝试使用HtmlAgilityPack但没有帮助。
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlString);
答案 0 :(得分:3)
该URL返回一个gzip压缩字符串。默认情况下,WebClient
不支持此功能,因此您需要转到基础HttpWebRequest
类。通过feroze在这里公然扼杀答案 - Automatically decompress gzip response via WebClient.DownloadData
class MyWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest;
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
return request;
}
}
答案 1 :(得分:0)
ServicePointManager.ServerCertificateValidationCallback = delegate { return true; };
WebClient webClient = new WebClient();
string htmlString = w.DownloadString(url);