网页未下载正确的阿拉伯语编码

时间:2014-01-13 11:35:18

标签: c# html encoding arabic webclient-download

我正在下载一个网页(http://library.islamweb.net/hadith/RawyDetails.php?RawyID=1),它包含一些阿拉伯语,当浏览器(chrome / IE)上的“查看源代码”选项查看时看起来不错:

<span lang="ar-qa">رقم الراوي</span>

然而,当下载它时,它看起来像:

<span lang="ar-qa">ÑÞã ÇáÑÇæí</span>

我的代码非常简单:

client.DownloadFile(_webPath, savePath);

有什么问题?

1 个答案:

答案 0 :(得分:1)

您的Page的编码字符集是“windows-1256”,因此您需要使用该编码来读取它:

private void GetRepliesStats_Load(object sender, EventArgs e)
        {
            WebBrowser bro = new WebBrowser();
            bro.Navigate("http://library.islamweb.net/hadith/RawyDetails.php?RawyID=1");
            bro.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(BrowsingCompleted);


        }

private void BrowsingCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
            {
                WebBrowser browser = sender as WebBrowser;

                Stream documentStream = browser.DocumentStream;
                StreamReader streamReader = new StreamReader(documentStream, Encoding.GetEncoding("windows-1256"));

                documentStream.Position = 0L;
                String My_Result = streamReader.ReadToEnd();


}

我希望这有帮助。