无法解析UTF-8 XML

时间:2015-07-11 09:35:17

标签: java android xml unicode utf-8

我的外部XML已经有

<?xml version="1.0" encoding="UTF-8"?>

但是,当我尝试在我的应用程序中解析它时,它根本不会读取Unicode!

这是我所做的,但仍然没有运气。

private class MyDownloadTask extends AsyncTask<Void,Void,Void>
{
    String URL = context.getResources().getString(R.string.XML_database_url);
    String KEY_ITEM = "item"; // parent node
    String KEY_NAME = "name";
    String KEY_COST = "location";
    String KEY_DESC = "url";
    ArrayList<RadioListElement> radioArray;

    protected void onPreExecute(final ArrayList<String> userRadios) {
        super.onPreExecute();
        radioArray = new ArrayList<RadioListElement>();
        MainActivity.getDataManager().loadStoredRadioStations(radioArray, userRadios);
    }

    protected Void doInBackground(Void... params) {
        String xml = getXmlFromUrl(URL);
        Document doc = getDomElement(xml);

        NodeList nl = doc.getElementsByTagName(KEY_ITEM);
        for (int i = 0; i < nl.getLength(); i++) {
            Element e = (Element) nl.item(i);
            String name = getValue(e, KEY_NAME);
            String cost = getValue(e, KEY_COST);
            String description = getValue(e, KEY_DESC);
            radioArray.add(new RadioListElement(context, name, cost, description));
        }
        return null;
}

public Document getDomElement(String xml){
        Document doc = null;
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        try {

            DocumentBuilder db = dbf.newDocumentBuilder();

            InputSource is = new InputSource(is,"UTF-8");
            is.setCharacterStream(new StringReader(xml));

            doc = db.parse(is);

        } catch (ParserConfigurationException e) {
            Log.e("Error: ", e.getMessage());
            return null;
        } catch (SAXException e) {
            Log.e("Error: ", e.getMessage());
            return null;
        } catch (IOException e) {
            Log.e("Error: ", e.getMessage());
            return null;
        }
        // return DOM
        return doc;
    }

我把UTF-8放在这里

                InputSource is = new InputSource(is,"UTF-8");

我做错了什么?我怎样才能使这个工作显示Unicode对我来说很好?

2 个答案:

答案 0 :(得分:1)

不要尝试将xml转换为自己的字符串并尝试将字符串提供给dom解析器。 xml解析器是智能解释自己编码。

我建议从getXmlFromUrl(String url)更改InputStream以返回httpEntity,如下所示:

return httpEntity.getContent()

将此InputStream提供给DOM解析器,如下所示:

InputSource is = new InputSource(inputStream);

请注意,is

中未设置编码

现在解析此is并验证它是否按预期解析unicode

答案 1 :(得分:0)

我将utf-8添加到从网址抓取xml的代码中。应该是这样的:

xml = EntityUtils.toString(httpEntity,"utf-8");

public String getXmlFromUrl(String url) {
    String xml = null;
    try {
        DefaultHttpClient httpClient = new DefaultHttpClient();
        HttpPost httpPost = new HttpPost(url);

        HttpResponse httpResponse = httpClient.execute(httpPost);
        HttpEntity httpEntity = httpResponse.getEntity();
        xml = EntityUtils.toString(httpEntity,"utf-8");

    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    } catch (ClientProtocolException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
    return xml;
}