为什么我无法使用WebClient读取此下载页面的ResponseHeaders?

时间:2014-01-17 14:30:35

标签: c# .net character-encoding webclient

这是我要分析的页面( iso-8859-1 ):

http://www.unione.tn.it/cms-01.00/articolo.asp?IDcms=20488

所以如果你看一下源代码:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

这是我的代码:

using (WebClient client = new WebClient())
{
    client.Headers.Add("user-agent", HttpContext.Current.Request.UserAgent);
    var rawBytes = client.DownloadData(HttpUtility.UrlDecode(resoruce_url));

    var contentType = new ContentType(client.ResponseHeaders["Content-Type"]);
    Response.Write(client.ResponseHeaders);
}

但它打印出来:

Content-Length: 22967 
Content-Type: text/html 
Date: Fri, 17 Jan 2014 14:24:17 GMT Expires: Fri, 17 Jan 2014 14:24:16 GMT 
Set-Cookie: Lang=1; expires=Fri, 16-Jan-2015 23:00:00 GMT; path=/,ASPSESSIONIDACCAQTTC=PGGNBKJAHLBBCMELCOMHMHJG; path=/ 
Server: Microsoft-IIS/6.0 
X-Powered-By: ASP.NET 
Cache-control: private 

Content-Typetext/html。它失去了 iso-8859-1

为什么呢?我怎么能得到它?

1 个答案:

答案 0 :(得分:0)

如果你想获得charset值(页面编码),你可以试试这个

Encoding encoding = null;
using (WebClient client = new WebClient())
{
    string html = client.DownloadString(websiteUrl);
    encoding = doc.DetectEncodingHtml(html);
}