C#HtmlAgilityPack HtmlDocument()LoadHtml编码

时间:2013-08-16 09:21:49

标签: c# encoding

Uri url = new Uri("http://localhost/rgm.php");
WebClient client = new WebClient();
string html = client.DownloadString(url);

HtmlAgilityPack.HtmlDocument doc23 = new HtmlAgilityPack.HtmlDocument();
doc23.LoadHtml(html);

HtmlNode body23 = doc23.DocumentNode.SelectSingleNode("//body");

string content23 = body23.InnerHtml;

如何强制使用“UTF-8”编码解析网页?

3 个答案:

答案 0 :(得分:5)

使用WebClient的DownloadData方法代替DownloadString()

WebClient client = new WebClient();
var data = client.DownloadData(url);
var html = Encoding.UTF8.GetString(data);

答案 1 :(得分:3)

使用MemoryStream

WebClient client = new WebClient(); 
MemoryStream ms = new MemoryStream(client.DownloadData("http://localhost/rgm.php"));

HtmlDocument doc23 = new HtmlDocument();
doc23.Load(ms, Encoding.UTF8);

HtmlNode body23 = doc23.DocumentNode.SelectSingleNode("//body");
string content23 = body23.InnerHtml;

答案 2 :(得分:0)

它可能是另一种选择。

string url = "http://localhost/rgm.php";
            var Webget = new HtmlWeb();

 Webget.OverrideEncoding = Encoding.UTF8;
            var doc23 = Webget.Load(url);

HtmlNode body23 = doc23.DocumentNode.SelectSingleNode("//body");
string content23 = body23.InnerHtml;