Question

我的任务是修改包含RSS提要的String。它有元素。我需要修改这些链接元素，然后输出所有内容。我尝试过使用Documentbuilder但每次尝试修改节点时都会删除所有后代节点。

任何人都可以建议一种简单的方法来检索和修改这些节点，然后打印整个Feed。

public Document XMLParser(String rssFeed){
    DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder docBuilder = null;
    String nodeContents = null;
    String newXML = "";
    try {
        docBuilder = docFactory.newDocumentBuilder();
        Document doc = docBuilder.parse(new InputSource(new ByteArrayInputStream(rssFeed.getBytes("utf-8"))));

        Node node = doc.getFirstChild();
        NodeList list = node.getChildNodes();
        NodeList nodeList = doc.getElementsByTagName("*");

        for (int i = 0; i < nodeList.getLength(); i++) {
            Node curNode = nodeList.item(i);
            if ("link".equals(curNode.getNodeName()) || "channel".equals(curNode.getNodeName())) {
                nodeContents = curNode.getTextContent();
                nodeContents = "new contents";
                curNode.setTextContent(nodeContents);
            }
        }
        return doc;

    }catch (Exception e) {
        e.printStackTrace();
    }
    return null;
}

RSS示例：

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
    <title>title for the channel</title>
    <link><![CDATA[www.whatever.com]]></link>
    <description><![CDATA[description of the channel.]]></description>
    <item>
        <title><![CDATA[title of the link]]></title>
        <description><![CDATA[description of the link]]></description>
        <link><![CDATA[www.whatever.com]]></link>
        <enclosure url="thepictureURL" length="21830" type="image/png" />
        <pubDate>Thu, 01 Jan 2000 00:00:00 EDT</pubDate>
    </item>
</channel>
</rss>

Answer 1

留意setTextContent(text)。如果在具有子节点的节点上调用它，则会将其替换为text。

如果RSS不是太大，你可以将它加载到内存中 - 将其解析为DOM。修改<link>个节点的内容。然后将DOM序列化为字符串：

public static String processLinks(String rssFeed) throws Exception {
  DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
  DocumentBuilder builder = docFactory.newDocumentBuilder();
  Document doc = builder.parse(new InputSource(new StringReader(rssFeed)));

  NodeList nodeList = doc.getElementsByTagName("link");
  for (int i = 0; i < nodeList.getLength(); i++) {
    Node link = nodeList.item(i);
    String value = link.getTextContent();
    //Do the processing. For example:
    if(!value.startsWith("http://")) {
      link.setTextContent("http://"+value);
    }
  }
  return toString(doc);
}

private static String toString(Document doc) throws Exception {
  TransformerFactory tf = TransformerFactory.newInstance();
  Transformer transformer = tf.newTransformer();
  transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
  StringWriter writer = new StringWriter();
  transformer.transform(new DOMSource(doc), new StreamResult(writer));
  return writer.toString();
}

如何修改XML文件的元素然后打印整个元素

1 个答案: