Java DOM无法识别CDATA

时间:2013-04-17 05:53:33

标签: xml parsing dom

我有一个带CDATA的XML。这是我试图加载到Java DOM中的XML。

<?xml version="1.0" encoding="utf-8"?><search:Search xmlns:search="Search"><search:Response xmlns="Search"><search:Store xmlns="Search">";
<search:Result xmlns="Search">";
<search:Properties xmlns="Search">";
<email2:ConversationId xmlns:email2="Email2"><![CDATA["B3:5F:18:81:37:4B:E4:4C:97:CE:9A:5A:18:6E:DE:8D:"]]></email2:ConversationId>";
<email:Categories xmlns:email="Email"></email:Categories>
</search:Properties>
</search:Result>
</search:Store>
</search:Response>
 </search:Search>

以下是加载它的代码:

import org.w3c.dom.Attr;
import org.w3c.dom.CDATASection;
import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSParser;
...
...
    try {
        DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
        DOMImplementationLS impl = (DOMImplementationLS)registry.getDOMImplementation("LS");
        LSParser builder = impl.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, null);
        DOMInputImpl input = new DOMInputImpl();
        input.setByteStream(new ByteArrayInputStream(xmlString.getBytes("utf-8")));
        xmlDoc = builder.parse(input);
        return xmlDoc;
    } catch (ClassNotFoundException | InstantiationException
            | IllegalAccessException | ClassCastException | UnsupportedEncodingException e) {
        throw new MyException(e);
    }

但是,我发现解析的文档没有CDATA NodeType org.w3c.dom.CDATASection。相反,nodetype以#text的形式出现。

非常感谢任何帮助。

1 个答案:

答案 0 :(得分:0)

有趣! LSParser将CDATA节节点“转换”为文本节点(可能在规范中的某处说明)。但是,如果您使用JAXP API(噪音要小得多),您将获得#cdata-section

    DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
    f.setNamespaceAware(true);
    DocumentBuilder builder = f.newDocumentBuilder();
    Document doc = builder.parse(...);