什么是谷歌的字符编码?

时间:2013-10-11 20:04:40

标签: asp.net character-encoding

显然Google的编码是UTF-8,因为它的html元标记中有说明。 但是当我用ASP WebRequest.GetResponse()打开scharfes + s的搜索页面时,它充满了无法识别的字符。有人知道那里发生了什么吗?

为方便起见,代码粘贴在

下面

Asp Page

<form id="form1" runat="server">
<div>
    <div runat="server" id="output"/>
</div>
</form>

代码隐藏

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Net;
using System.IO;
using System.Text;

public partial class SearchEngineCaller : System.Web.UI.Page
{
    protected void Page_Load(object sender, EventArgs e)
    {
        HttpWebRequest queryPage = (HttpWebRequest)WebRequest.Create("https://www.google.com/search?q=scharfes+s");
        queryPage.Credentials = CredentialCache.DefaultCredentials;

        HttpWebResponse response = (HttpWebResponse)queryPage.GetResponse();

        Stream receiveStream = response.GetResponseStream();
        StreamReader readStream = new StreamReader(receiveStream, Encoding.UTF8);
        output.InnerHtml = readStream.ReadToEnd();
    }
}

Returned Result

我应该使用什么编码?

1 个答案:

答案 0 :(得分:2)

您必须为HttpWebRequest对象设置一些HTTP标头:

HttpWebRequest queryPage = (HttpWebRequest)WebRequest.Create("https://www.google.com/search?q=scharfes+s");
queryPage.Credentials = CredentialCache.DefaultCredentials;
queryPage.Accept = "text/html";
queryPage.Headers["Accept-Charset"] = "utf-8";
queryPage.UserAgent = "Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20100101 Firefox/21.0";

重要提示:设置Accept-Charset是不够的,设置User-Agent也很重要(我从here复制了上述用户代理字符串)。我尝试了这个解决方案,它适用于我(test code)。