使用HttpClient显示非ASCII字符

时间:2010-12-23 20:53:24

标签: java android html httpclient

所以,我正在使用此代码来获取网站的整个HTML。但我似乎没有得到非ascii字符。我得到的只是带问号的钻石 像这样的人物:å,看起来像这样:
我怀疑它是因为字符集,它会是什么呢?

Log.e("HTML", "henter htmlen..");
            String url = "http://beep.tv2.dk";
            HttpClient client = new DefaultHttpClient();
            client.getParams().setParameter(CoreProtocolPNames.PROTOCOL_VERSION, 
                    HttpVersion.HTTP_1_1);
            client.getParams().setParameter(CoreProtocolPNames.HTTP_ELEMENT_CHARSET, "UTF-8");
            HttpGet request = new HttpGet(url);
            HttpResponse response = client.execute(request);
            Header h = HeaderValueFormatter
            response.addHeader(header)
            String html = "";
            InputStream in = response.getEntity().getContent();
            BufferedReader reader = new BufferedReader(new InputStreamReader(in));
            StringBuilder str = new StringBuilder();
            String line = null;
            while((line = reader.readLine()) != null)
            {
                str.append(line);
            }
            in.close();
        //b = false;
        html = str.toString();

3 个答案:

答案 0 :(得分:3)

谢谢。这有效(如果其他人有问题):

HttpClient client = new DefaultHttpClient();
    client.getParams().setParameter(CoreProtocolPNames.PROTOCOL_VERSION, 
         HttpVersion.HTTP_1_1);
    client.getParams().setParameter(CoreProtocolPNames.HTTP_ELEMENT_CHARSET, "iso-8859-1");
    HttpGet request = new HttpGet(url);
    request.setHeader("Accept-Charset", "iso-8859-1, unicode-1-1;q=0.8");
    HttpResponse response = client.execute(request);
    String html = "";
    InputStream in = response.getEntity().getContent();
    BufferedReader reader = new BufferedReader(new InputStreamReader(in,"iso-8859-1"));

答案 1 :(得分:2)

  1. 使用new InputStreamReader(in, "UTF-8")构造函数
  2. Accept-Charset请求标头设置为Accept-Charset: iso-8859-5, unicode-1-1;q=0.8
  3. 确保在浏览器中正确打开页面。如果没有,则可能是服务器端问题。
  4. 如果上述方法均无效,请使用firebug(或类似工具)检查其他标题

答案 2 :(得分:1)

这确实帮助我开始了,但在阅读文本文件时我遇到了同样的问题。它使用以下命令修复:

    BufferedReader br = new BufferedReader(new InputStreamReader(new 
                FileInputStream(fileName), "iso-8859-1"));

...当然,HTTP响应也需要有编码集:

    response.setCharacterEncoding("UTF-8");

感谢您的帮助!