这是我要分析的页面( iso-8859-1 ):
http://www.unione.tn.it/cms-01.00/articolo.asp?IDcms=20488
所以如果你看一下源代码:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
这是我的代码:
using (WebClient client = new WebClient())
{
client.Headers.Add("user-agent", HttpContext.Current.Request.UserAgent);
var rawBytes = client.DownloadData(HttpUtility.UrlDecode(resoruce_url));
var contentType = new ContentType(client.ResponseHeaders["Content-Type"]);
Response.Write(client.ResponseHeaders);
}
但它打印出来:
Content-Length: 22967
Content-Type: text/html
Date: Fri, 17 Jan 2014 14:24:17 GMT Expires: Fri, 17 Jan 2014 14:24:16 GMT
Set-Cookie: Lang=1; expires=Fri, 16-Jan-2015 23:00:00 GMT; path=/,ASPSESSIONIDACCAQTTC=PGGNBKJAHLBBCMELCOMHMHJG; path=/
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Cache-control: private
Content-Type
是text/html
。它失去了 iso-8859-1 。
为什么呢?我怎么能得到它?
答案 0 :(得分:0)
如果你想获得charset值(页面编码),你可以试试这个
Encoding encoding = null;
using (WebClient client = new WebClient())
{
string html = client.DownloadString(websiteUrl);
encoding = doc.DetectEncodingHtml(html);
}