在这里阅读答案: Normalization in DOM parsing with java - how does it work?
我了解规范化将删除空的相邻文本节点,我尝试了以下xml:
<company>hello
wor
ld
</company>
具有以下代码:
try {
DocumentBuilder dBuilder = DocumentBuilderFactory.newInstance()
.newDocumentBuilder();
Document doc = dBuilder.parse(file);
doc.getDocumentElement().normalize();
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
System.out.println(doc.getDocumentElement().getChildNodes().getLength());
System.out.println(doc.getDocumentElement().getChildNodes().item(0).getTextContent());
} catch (Exception e) {
e.printStackTrace();
}
即使没有规范,我也总是为元素“ company”获得1个子节点。结果是:
Root element :company
1
hello
wor
ld
那这里怎么了?谁能解释?我不应该在一排打招呼世界吗?
答案 0 :(得分:1)
解析器已经在创建标准化的DOM树。
normalize()
方法在构建/修改DOM时很有用,因为它可能不会导致树标准化,在这种情况下,该方法将为您对其进行标准化。
常用助手
private static void printDom(String indent, Node node) {
System.out.println(indent + node);
for (Node child = node.getFirstChild(); child != null; child = child.getNextSibling())
printDom(indent + " ", child);
}
示例1
public static void main(String[] args) throws Exception {
String xml = "<Root>text 1<!-- test -->text 2</Root>";
DocumentBuilder domBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = domBuilder.parse(new InputSource(new StringReader(xml)));
printDom("", doc);
deleteComments(doc);
printDom("", doc);
doc.normalizeDocument();
printDom("", doc);
}
private static void deleteComments(Node node) {
if (node.getNodeType() == Node.COMMENT_NODE)
node.getParentNode().removeChild(node);
else {
NodeList children = node.getChildNodes();
for (int i = 0; i < children.getLength(); i++)
deleteComments(children.item(i));
}
}
输出
[#document: null]
[Root: null]
[#text: text 1]
[#comment: test ]
[#text: text 2]
[#document: null]
[Root: null]
[#text: text 1]
[#text: text 2]
[#document: null]
[Root: null]
[#text: text 1text 2]
示例2
public static void main(String[] args) throws Exception {
DocumentBuilder domBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = domBuilder.newDocument();
Element root = doc.createElement("Root");
doc.appendChild(root);
root.appendChild(doc.createTextNode("Hello"));
root.appendChild(doc.createTextNode(" "));
root.appendChild(doc.createTextNode("World"));
printDom("", doc);
doc.normalizeDocument();
printDom("", doc);
}
输出
[#document: null]
[Root: null]
[#text: Hello]
[#text: ]
[#text: World]
[#document: null]
[Root: null]
[#text: Hello World]