Question

我觉得我疯了。我想在没有模式的情况下打印一个org.w3c.dom.Document（在Java中）。缩进不是我需要的全部，我希望无用的空行和空格被忽略。不知何故，这种情况不会发生，每当我从文件解析XML或将其写回文件时，DOM文档中都有包含空格的文本节点（\ n，空格等）。有没有办法可以简单地摆脱这些，没有架构，也不通过迭代所有节点并删除空文本节点来自己转换XML？

示例：我的输入文件看起来像这样（但有更多空行：）

<mytag>
       <anothertag>content</anothertag>



</mytag>

我希望我的输出文件看起来像这样：

<mytag>
  <anothertag>content</anothertag>
</mytag>

注意：我没有XML的架构（因此我不得不调用builder.setValidating（false））并且在运行此代码时我没有互联网连接的奢侈。

谢谢！

更新：我发现了一些非常接近我需要的东西，也许它可以帮助其他士兵在没有架构的情况下对抗XML文档：

org.apache.axis.utils.XMLUtils.normalize(document);

源代码here。在创建Document之后以及在使用Transformer编写之前调用它将产生漂亮的输出，绝对没有模式验证。 JB Nizet也给了我一个有效的答案，但我觉得有些验证是在代码的幕后进行的，这会使它与我的用例不同。我将问题保持开放几天，以防万一有人有更好的解决方案。

Answer 1

这是一个有效的例子：

public class Xml {
    private static final String XML =
        "<mytag>\n" +
        "        <anothertag>content</anothertag>\n" +
        "\n" +
        "\n" +
        "\n" +
        "</mytag>";

    public static void main(String[] args) throws ParserConfigurationException, IOException, SAXException, InstantiationException, IllegalAccessException, ClassNotFoundException {
        DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
        documentBuilderFactory.setValidating(false);
        DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
        Document document = documentBuilder.parse(new InputSource(new StringReader(XML)));

        NodeList childNodes = document.getDocumentElement().getChildNodes();
        for (int i = 0; i < childNodes.getLength(); i++) {
           System.out.println(childNodes.item(i));
        }

        final DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
        final DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
        final LSSerializer writer = impl.createLSSerializer();

        writer.getDomConfig().setParameter("xml-declaration", false);
        writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);

        System.out.println(writer.writeToString(document));
    }
}

输出：

[#text: 
        ]
[anothertag: null]
[#text: 



]
<mytag>
    <anothertag>content</anothertag>
</mytag>

因此，解析器不会验证，它会保留文本节点，并且序列化程序产生的输出正如您所期望的那样。

我想在没有架构的情况下打印一个org.w3c.dom.Document

1 个答案: