如何从HTTPS URL读取html源代码

时间:2013-05-30 06:08:22

标签: c# html .net html-agility-pack

我正在尝试使用以下代码阅读c#中https网址的html源代码:

 WebClient webClient = new WebClient();
 string htmlString = w.DownloadString("https://www.targetUrl.com");

enter image description here

这对我来说不起作用,因为我得到了编码的html字符串。我尝试使用HtmlAgilityPack但没有帮助。

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlString);

2 个答案:

答案 0 :(得分:3)

该URL返回一个gzip压缩字符串。默认情况下,WebClient不支持此功能,因此您需要转到基础HttpWebRequest类。通过feroze在这里公然扼杀答案 - Automatically decompress gzip response via WebClient.DownloadData

class MyWebClient : WebClient
{
    protected override WebRequest GetWebRequest(Uri address)
    {
        HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest;
        request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
        return request;
    }
}

答案 1 :(得分:0)

ServicePointManager.ServerCertificateValidationCallback = delegate { return true; };
WebClient webClient = new WebClient();
string htmlString = w.DownloadString(url);