我从网站上获取html文本。这个网站返回如下图所示的字符。我试图从网站上找到字符集,它找到<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
在文本视图中设置后,它会在设备上显示输出,如:
我尝试了一些编码但不影响文字,如下所示:
final Charset windowsCharset = Charset.forName("windows-1252");
final Charset utfCharset = Charset.forName("UTF-8");
final CharBuffer windowsEncoded = windowsCharset.decode(ByteBuffer
.wrap(ne.scrape_detail_article_text.getBytes()));
final byte[] utfEncoded = utfCharset.encode(windowsEncoded).array();
// System.out.println(new String(utfEncoded, utfCharset.displayName()));
String s = "" ;
try {
// String s = new String(utfEncoded, utfCharset.displayName());
//String s = new String(texttoencoding.getBytes("windows-1252"),"UTF-8");
s = URLEncoder.encode(texttoencoding, "windows-1252");
Log.e("LOG", "Encoded >> " + s);
} catch (UnsupportedEncodingException e) {
Log.e("utf8", "conversion", e);
}
TextviewToset.setText(Html.fromHtml(texttoencoding);
TextviewToset.setMovementMethod(LinkMovementMethod.getInstance());
请帮助我,如何将此文字编码为UTF-8
并显示在文字视图中?
先谢谢
答案 0 :(得分:0)
看起来你在这里正在处理HTML-Entites。因此,您必须通过以下方式解码HTML实体:
String text = HTML.fromHtml(yourText).toString();
这将为您提供正确的UTF-8字符。 Html.fromHtml()
的文档是here