找到特定子元素时完全删除节点

时间:2014-09-18 16:06:13

标签: java xml

当在XML节点中找到特定子元素时,我需要完全删除节点/节点 对于Instance,我的XML如下:

<?xml version="1.0"?>
<booklist>
 <book>
     <name>THEORY OF DYNAMICS</name>
     <author>JOHN</author>
     <price>09786</price>
 </book>
 <book>
     <name>ABCD</name>
     <author>STACEY</author>
     <price>765</price>
 </book>
 <book>
     <name>ABCD</name>
     <author>BTYSON</author>
     <price>34974</price>
 </book>
 <book>
     <name>ABCD</name>
     <author>CTYSON</author>
     <price>09534</price>
 </book>
 <book>
     <name>INTRODUCING JAVA</name>
     <author>CHARLES</author>
     <price>1234</price>
 </book>
 <book>
     <name>ABCD</name>
     <author>TYSON</author>
     <price>34534</price>
 </book>

所以,当我搜索book tag ='ABCD'时  我的结果如下:

OUTPUT XML:

<?xml version="1.0"?>
<booklist>
 <book>
     <name>THEORY OF DYNAMICS</name>
     <author>JOHN</author>
     <price>09786</price>
 </book>
  <book>
     <name>INTRODUCING JAVA</name>
     <author>CHARLES</author>
     <price>1234</price>
 </book>

我尝试的代码如下:

 try {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder docBuilder = factory.newDocumentBuilder();
        Document doc = docBuilder.parse(new File(FILENAME));
        NodeList list = doc.getElementsByTagName("*");
        for (int i = 0; i <list.getLength(); i++) {

            Node node = (Node) list.item(i);
            // Searching through entire file
            if (node.getNodeName().equalsIgnoreCase("book")) {
                NodeList childList = node.getChildNodes();
                // Looking thhrough all children nodes
                for (int x = 0; x < childList.getLength(); x++) {
                    Node child = (Node) childList.item(x);
                    // To search only "book" children
                    if (child.getNodeType() == Node.ELEMENT_NODE &&  
                  child.getNodeName().equalsIgnoreCase("name") && 
          child.getTextContent().toUpperCase().equalsIgnoreCase("abcd".toUpperCase())) {
                        // Delete node here
                        node.getParentNode().removeChild(node);
                    }
                }
            }
        }
        try {
            TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        DOMParser parser = new DOMParser();
        parser.parse(FILENAME);                       
        DOMSource source = new DOMSource(doc);
        StreamResult result = new StreamResult(new File(NEWFILE));
        transformer.transform(source, result);
        } catch (IOException io) {
            io.printStackTrace();

        }
    } catch (ParserConfigurationException pce) {
        pce.printStackTrace();
    } catch (IOException ioe) {
        ioe.printStackTrace();
    } catch (SAXException saxe) {
        saxe.printStackTrace();
    }

我无法删除所有具有子元素为“abcd”的书节点,而是只能删除少数具有子元素为“abcd”的备用书节点。 你能告诉我我的代码中有什么错误吗?为什么我无法删除名称为'abcd'的所有书籍节点?

1 个答案:

答案 0 :(得分:1)

DOM spec says

  

DOM中的NodeList和NamedNodeMap对象是实时的;那是,   对基础文档结构的更改将全部反映出来   相关的NodeList和NamedNodeMap对象。例如,如果是DOM用户   获取一个包含Element子项的NodeList对象   随后为该元素添加更多孩子(或删除孩子,   或者修改它们),这些变化会自动反映在   NodeList,无需对用户进行进一步操作。

因此,当您遍历NodeList list并从中删除节点时,这些更改会立即反映在NodeList中。因此,NodeList内的索引会发生变化,您永远不会遍历所有元素。

对此的一个解决方案是首先收集要删除的所有节点,然后在单独的循环中删除它们:

// ...

Document doc = docBuilder.parse(new File(FILENAME));
NodeList list = doc.getElementsByTagName("book");

// XXX collection of nodes to delete XXX
List<Node> delete = new ArrayList<Node>();

for (int i = 0; i <list.getLength(); i++) {

    Node node = list.item(i);
    NodeList childList = node.getChildNodes();

    // Looking through all children nodes
    for (int x = 0; x < childList.getLength(); x++) {

        Node child = childList.item(x);

        // To search only "book" children
        if (child.getNodeType() == Node.ELEMENT_NODE &&  
            child.getNodeName().equalsIgnoreCase("name") && 
            child.getTextContent().toUpperCase().equalsIgnoreCase("abcd".toUpperCase())) {
          // XXX just add to "to be deleted" list XXX
          delete.add( node );
          break;
        }
    }

}

// XXX delete nodes XXX
for( int i=0; i<delete.size(); i++ ) {
  Node node = delete.get( i );
  node.getParentNode().removeChild( node );
}

// ...

或者,您可以向后遍历列表,从list.getLength()开始向下0


我改变了另一件事:在您的代码中,您遍历文档中的所有节点,然后手动过滤<book>个节点。我认为最好只使用

选择<book>节点
NodeList list = doc.getElementsByTagName("book");

而不是

NodeList list = doc.getElementsByTagName("*");