所以,我正在使用此代码来获取网站的整个HTML。但我似乎没有得到非ascii字符。我得到的只是带问号的钻石
像这样的人物:å,看起来像这样:
我怀疑它是因为字符集,它会是什么呢?
Log.e("HTML", "henter htmlen..");
String url = "http://beep.tv2.dk";
HttpClient client = new DefaultHttpClient();
client.getParams().setParameter(CoreProtocolPNames.PROTOCOL_VERSION,
HttpVersion.HTTP_1_1);
client.getParams().setParameter(CoreProtocolPNames.HTTP_ELEMENT_CHARSET, "UTF-8");
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);
Header h = HeaderValueFormatter
response.addHeader(header)
String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
//b = false;
html = str.toString();
答案 0 :(得分:3)
谢谢。这有效(如果其他人有问题):
HttpClient client = new DefaultHttpClient();
client.getParams().setParameter(CoreProtocolPNames.PROTOCOL_VERSION,
HttpVersion.HTTP_1_1);
client.getParams().setParameter(CoreProtocolPNames.HTTP_ELEMENT_CHARSET, "iso-8859-1");
HttpGet request = new HttpGet(url);
request.setHeader("Accept-Charset", "iso-8859-1, unicode-1-1;q=0.8");
HttpResponse response = client.execute(request);
String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in,"iso-8859-1"));
答案 1 :(得分:2)
new InputStreamReader(in, "UTF-8")
构造函数Accept-Charset
请求标头设置为Accept-Charset: iso-8859-5, unicode-1-1;q=0.8
答案 2 :(得分:1)
这确实帮助我开始了,但在阅读文本文件时我遇到了同样的问题。它使用以下命令修复:
BufferedReader br = new BufferedReader(new InputStreamReader(new
FileInputStream(fileName), "iso-8859-1"));
...当然,HTTP响应也需要有编码集:
response.setCharacterEncoding("UTF-8");
感谢您的帮助!