Question

public XMLParser(InputStream is) {
    try {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db;
        db = dbf.newDocumentBuilder();
        Document doc = db.parse(is);
        node = doc.getDocumentElement();
    } catch (Exception e) {
        DebugLog.log(e);
    }
}

inputStream包含如下内容：“嘿，这是一个＆amp; uuml;字符。” 角色'＆amp; uuml;'是'ü';

当读取节点的内容System.out.println（node.getTextContent（））时，我收到“嘿，这是一个角色。” ＆安培; uuml;被削减了。

Answer 1

嗯，这是一份有效的文件吗？是否指定了编码？ - ＆gt; http://www.w3schools.com/XML/xml_encoding.asp

那些可能会有所帮助：

Howto let the SAX parser determine the encoding from the xml declaration? http://www.coderanch.com/t/127052/XML/XML-parsers-encoding-byte-order

Answer 2

问题是XML实体和HTML实体。我请求一个返回HTML实体数据的网页。我必须将HTML实体转换为XML实体并且它有效！

Check this answer for some code

XMLParser编码问题

2 个答案: