Question

这是我要分析的页面（ iso-8859-1 ）：

http://www.unione.tn.it/cms-01.00/articolo.asp?IDcms=20488

所以如果你看一下源代码：

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

这是我的代码：

using (WebClient client = new WebClient())
{
    client.Headers.Add("user-agent", HttpContext.Current.Request.UserAgent);
    var rawBytes = client.DownloadData(HttpUtility.UrlDecode(resoruce_url));

    var contentType = new ContentType(client.ResponseHeaders["Content-Type"]);
    Response.Write(client.ResponseHeaders);
}

但它打印出来：

Content-Length: 22967 
Content-Type: text/html 
Date: Fri, 17 Jan 2014 14:24:17 GMT Expires: Fri, 17 Jan 2014 14:24:16 GMT 
Set-Cookie: Lang=1; expires=Fri, 16-Jan-2015 23:00:00 GMT; path=/,ASPSESSIONIDACCAQTTC=PGGNBKJAHLBBCMELCOMHMHJG; path=/ 
Server: Microsoft-IIS/6.0 
X-Powered-By: ASP.NET 
Cache-control: private

Content-Type是text/html。它失去了 iso-8859-1 。

为什么呢？我怎么能得到它？

Answer 1

如果你想获得charset值（页面编码），你可以试试这个

Encoding encoding = null;
using (WebClient client = new WebClient())
{
    string html = client.DownloadString(websiteUrl);
    encoding = doc.DetectEncodingHtml(html);
}

为什么我无法使用WebClient读取此下载页面的ResponseHeaders？

1 个答案: