Question

我正在尝试解析包含外来字母的xml（具体是æøå），但是我在解析它们时遇到了问题。我没有得到任何错误，但这些字母被解析为;而不是æ我得到Ã，而不是å我得到Ã而不是ø我得到Ã¸ 我也注意到了char - 没有正确显示。我知道我可以为3个字母做.replaceAll，但是我不确定这里的问题是不是因为我在某个地方犯了错误，或者如果没有沿着replaceAll的路线走的话就不可能。

代码：

    private Document getDomElement(String xml) {
        Document doc = null;
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        try {

            DocumentBuilder db = dbf.newDocumentBuilder();

            InputSource is = new InputSource(new ByteArrayInputStream(
                    xml.getBytes()));
            // is.setCharacterStream(new StringReader(xml));
            is.setEncoding("UTF-8");
            Log.i(TAG, "Encoding: " + is.getEncoding());
            doc = db.parse(is);

        } catch (ParserConfigurationException e) {
            Log.e("Error: ", e.getMessage());
            return null;
        } catch (SAXException e) {
            Log.e("Error: ", e.getMessage());
            return null;
        } catch (IOException e) {
            Log.e("Error: ", e.getMessage());
            return null;
        }
        // return DOM
        return doc;
    }

    private String getValue(Element item, String str) {
        NodeList n = item.getElementsByTagName(str);
        return this.getElementValue(n.item(0));
    }

    private final String getElementValue(Node elem) {
        Node child;
        if (elem != null) {
            if (elem.hasChildNodes()) {
                for (child = elem.getFirstChild(); child != null; child = child
                        .getNextSibling()) {
                    if (child.getNodeType() == Node.TEXT_NODE) {
                        return child.getNodeValue();
                    }
                }
            }
        }
        return "";
    }
}

如果您需要查看比此更多的代码，请告诉我。

感谢任何建议 - 谢谢。

Answer 1

问题是您使用getBytes()将String参数转换为字节。你最好不要转换为字节：

InputSource is = new InputSource(new StringReader(xml));

我看到你在代码中注释掉了。你有什么理由不想使用它吗？

如果使用字节数组，最好这样做：

InputSource is = new InputSource(new ByteArrayInputStream(
    xml.getBytes("UTF-8")));

在旧版Android上，默认字符集取决于区域设置。

Answer 2

您正在做的是假设平台默认编码为“UTF-8”;我认为实际上它可能是“UTF-16”。

尝试将相同的编码名称传递给xml.getBytes（），就像对is.setEncoding（）一样。

使用dom和特殊字符进行XML解析

2 个答案: