我从这个网址获取html来源:“http://duhoc.dantri.com.vn/du-hoc/30-hoc-sinh-trung-tuyen-dai-hoc-my-nam-2018-chia-se-bi-kip-thanh-cong-20180418093640358.htm”by:
private static string getPageSource(string url)
{
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.UserAgent = "SO/1.0";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
//if (response.CharacterSet == null)
//{
readStream = new StreamReader(receiveStream, Encoding.UTF8);
//}
string data = readStream.ReadToEnd();
response.Close();
readStream.Close();
return data;
}
}
catch (Exception ex)
{
WriteLog("Exception get Page Source, Ex = " + ex.ToString());
}
return null;
}
浏览器显示页面的标题如下:“30họcsinhtúngtuểnđạihọcMỹnăm2018chiasẻ”bíkíp“thànhcông”但当我从该页面获得html源代码时通过调用上面给出的方法,页面的标题变为“30họcsinhtúngtuểnđạihọcMỹnăm2018chiasẻ”bí kí p“thà nh c&#244 ; ng “。为了解决这个问题,我将UTF8改为:
Encoding encode = System.Text.Encoding.GetEncoding(1255)
和UTF7,UTF32,但没有任何工作。那么,我做错了什么?