使用HttpWebRequest使用C#编码问题

时间:2011-02-21 18:06:52

标签: c# regex httpwebrequest

从HttpWebRequest返回字符串时,我收到了破坏我的回复(显示39;和uto;)的字符代码('和& quote;):

internal static void TranslateThis(Player player, string fromLang, string toLang, string text){
    try
    {
        string translated = null;
        HttpWebRequest hwr = (HttpWebRequest)HttpWebRequest.Create("http://translate.google.com/?langpair=" + fromLang + "|" + toLang + "&text=" + text.Replace(" ", "+") + "#");
        HttpWebResponse res = (HttpWebResponse)hwr.GetResponse();
        StreamReader sr = new StreamReader(res.GetResponseStream());
        string html = sr.ReadToEnd();
        int a = html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47;
        int b = html.IndexOf("</span>",html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47);
        translated = html.Substring(a, b - a);
        if (translated.Length < (10 * text.Length)){
            if (player == Player.Console)
            {
                player.ParseMessage(translated, true);
            }
            else
            {
                player.ParseMessage(translated, false);
            }
        } else {
            player.Message("Usage: /translate [lang] [message]");
        }
    }
    catch
    {
        player.Message("Usage: /translate [lang] [message]");
    }
}

3 个答案:

答案 0 :(得分:1)

首先确保您获得下载内容的正确编码。有关如何执行此操作的代码,请参阅此SO answer

基本上检查编码的http标头和元标记,并在必要时重新编码内容。然后做一个HttpUtility.HtmlDecode来摆脱任何HTML编码字符。现在您已准备好开始搜索您要查找的任何内容。

我还建议使用Html Agility Pack之类的东西来解析html而不是indexof。

答案 1 :(得分:1)

很难说你的ParseMessage方法到底有什么期望,所以这只是猜测:

您从Google翻译获得的结果是HTML格式。这意味着如果您想要纯文本输出,则必须将HTML转换为文本。您已经成功(至少现在,至少,直到谷歌翻译改变他们的输出页面一点点;您的解决方案不是完全傻瓜或面向未来)从HTML页面中提取翻译。但翻译仍然编码为HTML,您需要对其进行解码。为此,您可以使用WebUtility.HtmlDecode方法(假设您使用的是.NET Framework 4):在

之后
translated = html.Substring(a, b - a);

行,添加

translated = WebUtility.HtmlDecode(translated);

答案 2 :(得分:1)

与其他开发人员的讨论让我在最后一批评论之前尝试这一点。以下是最终的工作:

    internal static void TranslateThis(Player player, string fromLang, string toLang, string text){
        try
        {
            string translated = null;
            text = Regex.Replace(text, @"[^\w\.\'\s@-]", "");
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://translate.google.com/?langpair=" + fromLang + "|" + toLang + "&text=" + text.Replace(" ", "+") + "#");

            request.MaximumAutomaticRedirections = 4;
            request.MaximumResponseHeadersLength = 4;

            request.Credentials = CredentialCache.DefaultCredentials;
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();

            Stream receiveStream = response.GetResponseStream();

            StreamReader readStream = new StreamReader(receiveStream, Encoding.UTF7);
            String html = readStream.ReadToEnd() + "";
            int a = html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47;
            int b = html.IndexOf("</span>",html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47);
            translated = html.Substring(a, b - a);
            response.Close();
            readStream.Close();
            if (translated.Length < (10 * text.Length))
            {
                translated = translated.Replace("&#39", "'");
                translated = Regex.Replace(translated, @"[^\w\.\'\s@-]", "");
                if (player == Player.Console)
                {
                    player.ParseMessage(translated, true);
                }
                else
                {
                    player.ParseMessage(translated, false);
                }
            }
            else
            {
                player.Message("Usage: /translate [lang] [message]");
            }
        }
        catch(Exception ex)
        {
            player.Message("Error:" + ex.ToString());

        }
   }