我正在尝试从大型xml文件中删除节点。使用此代码,其他元素的标记也会被更改。我希望有人可以解释为什么或如何解决它。
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
Document document = dbf.newDocumentBuilder().parse(new File(filePath)); //filePath - source file
/*while (document.getElementsByTagName("IMFile").getLength() != 0){
//Loop until all childs are removed
Element element = (Element) document.getElementsByTagName("IMFile").item(0);
element.getParentNode().removeChild(element);
}*/
//Test for first appearance
Element element = (Element) document.getElementsByTagName("IMFile").item(0);
element.getParentNode().removeChild(element);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
t.transform(new DOMSource(document), new StreamResult(new File(filePath+"_New"))); //destination
它改变了xml的位置,例如:
<Attribute id="7" value="1920" name="width"/>
至<Attribute id="7" name="width" value="1920"/>
它也切断了一些开放或结束标签:
<PowerPointFilename></PowerPointFilename
&GT;到<PowerPointFilename/>
答案 0 :(得分:0)
您可以使用SAX转换器修改XML文档,同时保留属性顺序:
public static void main(String[] args) throws IOException, TransformerException, SAXException {
XMLReader reader = XMLReaderFactory.createXMLReader();
TransformerFactory tf = TransformerFactory.newInstance();
// Load the transformer definition from the file strip.xsl:
Transformer t = tf.newTransformer(new SAXSource(reader, new InputSource(new FileInputStream("strip.xsl"))));
// Transform the file test.xml to stdout:
t.transform(new SAXSource(reader, new InputSource(new FileInputStream("test.xml"))), new StreamResult(System.out));
}
这是一个剥离IMFile元素的XSL转换:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Copy -->
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<!-- Strip IMFile elements -->
<xsl:template match="IMFile"/>
</xsl:stylesheet>