我正在解析UTF-8编码的XML文件,其中包含一些其他正常工作的阿拉伯字符,但不显示阿拉伯字符,一些奇怪的字符显示如下:
ÙرÙÙ
这是XML“http://212.12.165.44:7201/UniNews121.xml”文件解析
的链接下面是代码
public String getXmlFromUrl(String url) {
try {
return new AsyncTask<String, Void, String>() {
@Override
protected String doInBackground(String... params) {
//String xml = null;
try {
DefaultHttpClient httpClient = new DefaultHttpClient();
httpClient.getParams().setParameter(CoreProtocolPNames.HTTP_CONTENT_CHARSET,"UTF-8");
HttpGet httpPost = new HttpGet(params[0]);
HttpResponse httpResponse = httpClient.execute(httpPost);
HttpEntity httpEntity = httpResponse.getEntity();
xml = new String(EntityUtils.toString(httpEntity).getBytes(),"UTF-8");
} catch (Exception e) {
e.printStackTrace();
}
//just to remove the BOM Element
xml=xml.substring(3);
//Here am printing the xml and the arabic chars are malformed
Log.i("DEMO", xml);
return xml;
}
}.execute(url).get();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ExecutionException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return xml;
}
请注意,没有错误发生且一切正常,只是阿拉伯语字符格式不正确。
感谢您的帮助,但请具体说明您的答案
答案 0 :(得分:1)
此
xml = new String(EntityUtils.toString(httpEntity).getBytes(),"UTF-8");
没有做你想要的。 EntityUtils.toString()
使用默认字符集,然后调用getBytes(),它在没有指定编码的情况下也使用平台编码,然后调用new String,它尝试将此byte []读作UTF-8字符串字节[]。
您只需致电
xml = EntityUtils.toString(httpEntity, "UTF-8");