Question

可能重复：
Parsing an UTF-8 Encodded XML file

我正在解析UTF-8编码的XML文件，其中包含一些其他正常工作的阿拉伯字符，但不显示阿拉伯字符，一些奇怪的字符显示如下：

ÙØ±ÙÙ

这是XML“http://212.12.165.44:7201/UniNews121.xml”文件解析

的链接

下面是代码

        public String getXmlFromUrl(String url) {

        try {
            return new AsyncTask<String, Void, String>() {
                @Override
                protected String doInBackground(String... params) {
                    //String xml = null;
                    try {

                        DefaultHttpClient httpClient = new DefaultHttpClient();
                        httpClient.getParams().setParameter(CoreProtocolPNames.HTTP_CONTENT_CHARSET,"UTF-8");
                        HttpGet httpPost = new HttpGet(params[0]);
                        HttpResponse httpResponse = httpClient.execute(httpPost);
                        HttpEntity httpEntity = httpResponse.getEntity();
                        xml = new String(EntityUtils.toString(httpEntity).getBytes(),"UTF-8");

                    } catch (Exception e) {
                        e.printStackTrace();
                    }

                                    //just to remove the BOM Element    
                    xml=xml.substring(3);

            //Here am printing the xml and the arabic chars are malformed                                                       
                                    Log.i("DEMO", xml);
                    return xml;

                }
            }.execute(url).get();
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (ExecutionException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return xml;
    }

请注意，没有错误发生且一切正常，只是阿拉伯语字符格式不正确。

感谢您的帮助，但请具体说明您的答案

Answer 1

此

xml = new String(EntityUtils.toString(httpEntity).getBytes(),"UTF-8");

没有做你想要的。 EntityUtils.toString()使用默认字符集，然后调用getBytes（），它在没有指定编码的情况下也使用平台编码，然后调用new String，它尝试将此byte []读作UTF-8字符串字节[]。

您只需致电

xml = EntityUtils.toString(httpEntity, "UTF-8");

HttpResponse中格式错误的阿拉伯语

1 个答案: