我正在尝试阅读内容中包含注册商标符号的网页,即®。但是,当我使用快速表并在下面的例子中查看某人时,我看到一个带有问号的钻石而不是®。如果我序列化sb并通过javascript将其显示在另一个网页中,则会出现同样的问题。这只是这个字符在我的快速监视窗口中出现的方式,还是我不正确地读取/解码页面?代码如下:
const int bufSize = 4096;
const int maxBytesToGet = 5000000;
byte[] buf = new byte[bufSize];
StringBuilder sb = new StringBuilder(bufSize);
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
using (Stream responseStream = response.GetResponseStream())
{
while ((bytesToGet = responseStream.Read(buf, 0, buf.Length)) != 0)
{
sb.Append(Encoding.UTF8.GetString(buf, 0, bytesToGet));
if (sb.Length > maxBytesToGet) break;
}
}
}
答案 0 :(得分:5)
您假设响应为UTF8。您需要查看响应标头以查看实际编码的内容。使用StreamReader
代替Encoding.GetString
也更容易。
string responseText;
using (var response = (HttpWebResponse)request.GetResponse())
{
using (Stream responseStream = response.GetResponseStream())
{
var encoding = Encoding.GetEncoding(response.CharacterSet);
using(var reader = new StreamReader(responseStream, encoding))
{
responseText = reader.ReadToEnd();
}
}
}