This question让我非常接近并且确实有效。现在我试图更好地理解它并使其更加健壮。
拥有以下测试代码:
// Just build a test xml
String xml;
xml = "<aaa Batt = \"That\" Aatt=\"this\" >\n";
xml += "<!-- Document comment --><bbb moarttt=\"fasf\" lolol=\"dsf\"/>\n";
xml += " <ccc/></aaa>";
// do the necessary bureaucracy
DocumentBuilder docBuilder;
docBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc;
doc = docBuilder.parse(new ByteArrayInputStream(xml.getBytes()));
// Normalize document
// Do I realy need to do this?
doc.normalize();
// Canonize using Apache's Xml security
org.apache.xml.security.Init.init(); // Doesnt work if I don't do this.
byte[] c14nOutputbytes = Canonicalizer.getInstance(
Canonicalizer.ALGO_ID_C14N_EXCL_WITH_COMMENTS)
.canonicalizeSubtree(doc.getDocumentElement());
// This was a reparse reccomended to get attributes in alpha order
Document canon = docBuilder.parse(new ByteArrayInputStream(c14nOutputbytes));
// Input and output for the transformer
DOMSource xmlInput = new DOMSource(canon);
StreamResult xmlOutput = new StreamResult(new StringWriter());
// Configure transformer and format code
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(
"{http://xml.apache.org/xslt}indent-amount", "4");
transformer.transform(xmlInput, xmlOutput);
// And print it
System.out.println(xmlOutput.getWriter().toString());
执行此代码,将输出:
<aaa Aatt="this" Batt="That">
<!-- Document comment --><bbb lolol="dsf" moarttt="fasf"/>
<ccc/>
</aaa>
哪些可能是封圣的,但似乎并不尊重我要求变压器做的缩进。
有了这样一个例子,我有几个问题:
.normalize()
和Canonicalizer.ALGO_ID_C14N_EXCL_WITH_COMMENTS
之间有什么区别吗?删除它们中的任何一个似乎都会产生相同的结果(同样在我的意图中有一个规范和漂亮的打印xml)。<aaa Batt = \"That\" Aatt=\"this\" ><!-- Document comment --><bbb moarttt=\"fasf\" lolol=\"dsf\"/><ccc/></aaa>
,那么xml的格式是完美的。<ccc/>
等标记未展开到<ccc></ccc>
? Wikipedia says“空元素被编码为开始/结束对,而不是使用特殊的空元素语法”。对不起,如果这些问题一下子太多,但我觉得所有这些问题的答案应该有些相同。