Question

This question让我非常接近并且确实有效。现在我试图更好地理解它并使其更加健壮。

拥有以下测试代码：

// Just build a test xml
String xml;
xml = "<aaa Batt = \"That\" Aatt=\"this\" >\n";
xml += "<!-- Document comment --><bbb moarttt=\"fasf\" lolol=\"dsf\"/>\n";
xml += "         <ccc/></aaa>";

// do the necessary bureaucracy
DocumentBuilder docBuilder;
docBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc;
doc = docBuilder.parse(new ByteArrayInputStream(xml.getBytes()));

// Normalize document
// Do I realy need to do this?
doc.normalize();

// Canonize using Apache's Xml security
org.apache.xml.security.Init.init(); // Doesnt work if I don't do this.
byte[] c14nOutputbytes = Canonicalizer.getInstance(
        Canonicalizer.ALGO_ID_C14N_EXCL_WITH_COMMENTS)
        .canonicalizeSubtree(doc.getDocumentElement());
// This was a reparse reccomended to get attributes in alpha order
Document canon = docBuilder.parse(new ByteArrayInputStream(c14nOutputbytes));

// Input and output for the transformer
DOMSource xmlInput = new DOMSource(canon);
StreamResult xmlOutput = new StreamResult(new StringWriter());

// Configure transformer and format code
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(
    "{http://xml.apache.org/xslt}indent-amount", "4");
transformer.transform(xmlInput, xmlOutput);

// And print it
System.out.println(xmlOutput.getWriter().toString());

执行此代码，将输出：

<aaa Aatt="this" Batt="That">
<!-- Document comment --><bbb lolol="dsf" moarttt="fasf"/>
         <ccc/>
</aaa>

哪些可能是封圣的，但似乎并不尊重我要求变压器做的缩进。

有了这样一个例子，我有几个问题：

出于我的意图，.normalize()和Canonicalizer.ALGO_ID_C14N_EXCL_WITH_COMMENTS之间有什么区别吗？删除它们中的任何一个似乎都会产生相同的结果（同样在我的意图中有一个规范和漂亮的打印xml）。
为什么xml中的空格似乎会破坏格式？我是否必须修剪每个xml节点的文本才能使其工作？这听起来不对，但是如果输入xml是<aaa Batt = \"That\" Aatt=\"this\" ><bbb moarttt=\"fasf\" lolol=\"dsf\"/><ccc/></aaa>，那么xml的格式是完美的。
为什么在询问规范表单后，<ccc/>等标记未展开到<ccc></ccc>？ Wikipedia says“空元素被编码为开始/结束对，而不是使用特殊的空元素语法”。

对不起，如果这些问题一下子太多，但我觉得所有这些问题的答案应该有些相同。

Java中的XML Canonical形式

0 个答案: