我正在下载一个网页(http://library.islamweb.net/hadith/RawyDetails.php?RawyID=1),它包含一些阿拉伯语,当浏览器(chrome / IE)上的“查看源代码”选项查看时看起来不错:
<span lang="ar-qa">رقم الراوي</span>
然而,当下载它时,它看起来像:
<span lang="ar-qa">ÑÞã ÇáÑÇæí</span>
我的代码非常简单:
client.DownloadFile(_webPath, savePath);
有什么问题?
答案 0 :(得分:1)
您的Page的编码字符集是“windows-1256”,因此您需要使用该编码来读取它:
private void GetRepliesStats_Load(object sender, EventArgs e)
{
WebBrowser bro = new WebBrowser();
bro.Navigate("http://library.islamweb.net/hadith/RawyDetails.php?RawyID=1");
bro.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(BrowsingCompleted);
}
private void BrowsingCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser browser = sender as WebBrowser;
Stream documentStream = browser.DocumentStream;
StreamReader streamReader = new StreamReader(documentStream, Encoding.GetEncoding("windows-1256"));
documentStream.Position = 0L;
String My_Result = streamReader.ReadToEnd();
}
我希望这有帮助。