Question

我正在尝试使用以下代码阅读c＃中https网址的html源代码：

 WebClient webClient = new WebClient();
 string htmlString = w.DownloadString("https://www.targetUrl.com");

enter image description here

这对我来说不起作用，因为我得到了编码的html字符串。我尝试使用HtmlAgilityPack但没有帮助。

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlString);

Answer 1

该URL返回一个gzip压缩字符串。默认情况下，WebClient不支持此功能，因此您需要转到基础HttpWebRequest类。通过feroze在这里公然扼杀答案 - Automatically decompress gzip response via WebClient.DownloadData

class MyWebClient : WebClient
{
    protected override WebRequest GetWebRequest(Uri address)
    {
        HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest;
        request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
        return request;
    }
}

Answer 2

ServicePointManager.ServerCertificateValidationCallback = delegate { return true; };
WebClient webClient = new WebClient();
string htmlString = w.DownloadString(url);

如何从HTTPS URL读取html源代码

2 个答案: