这个字符集的Android编码

时间:2014-03-18 12:56:36

标签: android text utf-8 character-encoding

我从网站上获取html文本。这个网站返回如下图所示的字符。我试图从网站上找到字符集,它找到<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

enter image description here

在文本视图中设置后,它会在设备上显示输出,如:

enter image description here

我尝试了一些编码但不影响文字,如下所示:

    final Charset windowsCharset = Charset.forName("windows-1252");
    final Charset utfCharset = Charset.forName("UTF-8");
    final CharBuffer windowsEncoded = windowsCharset.decode(ByteBuffer
            .wrap(ne.scrape_detail_article_text.getBytes()));
    final byte[] utfEncoded = utfCharset.encode(windowsEncoded).array();
    // System.out.println(new String(utfEncoded, utfCharset.displayName()));

    String s = "" ;
    try {
        // String s = new String(utfEncoded, utfCharset.displayName());

        //String s = new String(texttoencoding.getBytes("windows-1252"),"UTF-8");

        s = URLEncoder.encode(texttoencoding, "windows-1252");

        Log.e("LOG", "Encoded >> " + s);
    } catch (UnsupportedEncodingException e) {
        Log.e("utf8", "conversion", e);
    }

  TextviewToset.setText(Html.fromHtml(texttoencoding);
    TextviewToset.setMovementMethod(LinkMovementMethod.getInstance());

请帮助我,如何将此文字编码为UTF-8并显示在文字视图中?

先谢谢

1 个答案:

答案 0 :(得分:0)

看起来你在这里正在处理HTML-Entites。因此,您必须通过以下方式解码HTML实体:

String text = HTML.fromHtml(yourText).toString();

这将为您提供正确的UTF-8字符。 Html.fromHtml()的文档是here