在Java DOM中将节点的内部XML作为String获取

时间:2010-07-21 15:14:33

标签: java xml dom

我有一个XML org.w3c.dom.Node,如下所示:

<variable name="variableName">
    <br /><strong>foo</strong> bar
</variable>

如何将<br /><strong>foo</strong> bar部分作为字符串?

10 个答案:

答案 0 :(得分:43)

同样的问题。为了解决这个问题,我编写了这个辅助函数:

public String innerXml(Node node) {
    DOMImplementationLS lsImpl = (DOMImplementationLS)node.getOwnerDocument().getImplementation().getFeature("LS", "3.0");
    LSSerializer lsSerializer = lsImpl.createLSSerializer();
    NodeList childNodes = node.getChildNodes();
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < childNodes.getLength(); i++) {
       sb.append(lsSerializer.writeToString(childNodes.item(i)));
    }
    return sb.toString(); 
}

答案 1 :(得分:6)

答案 2 :(得分:4)

如果您使用的是jOOX,则可以使用类似jquery的语法封装您的节点,并在其上调用toString()

$(node).toString();

它在内部使用身份变换器,如下所示:

ByteArrayOutputStream out = new ByteArrayOutputStream();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
Source source = new DOMSource(element);
Result target = new StreamResult(out);
transformer.transform(source, target);
return out.toString();

答案 3 :(得分:2)

如果您不想使用外部库,以下解决方案可能会派上用场。如果您有一个节点<parent><child name="Nina"/></parent>,并且您想要提取父元素的子元素,请按以下步骤操作:

    StringBuilder resultBuilder = new StringBuilder();
    // Get all children of the given parent node
    NodeList children = parent.getChildNodes();
    try {

        // Set up the output transformer
        TransformerFactory transfac = TransformerFactory.newInstance();
        Transformer trans = transfac.newTransformer();
        trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        trans.setOutputProperty(OutputKeys.INDENT, "yes");
        StringWriter stringWriter = new StringWriter();
        StreamResult streamResult = new StreamResult(stringWriter);

        for (int index = 0; index < children.getLength(); index++) {
            Node child = children.item(index);

            // Print the DOM node
            DOMSource source = new DOMSource(child);
            trans.transform(source, streamResult);
            // Append child to end result
            resultBuilder.append(stringWriter.toString());
        }
    } catch (TransformerException e) {
        //Error handling goes here
    }
    return resultBuilder.toString();

答案 4 :(得分:2)

根据Andrey M的回答,我不得不稍微修改代码以获取完整的DOM文档。如果您只是使用

 NodeList childNodes = node.getChildNodes();

它没有为我提供根元素。要包含根元素(并获取完整的.xml文档),我使用了:

 public String innerXml(Node node) {
     DOMImplementationLS lsImpl = (DOMImplementationLS)node.getOwnerDocument().getImplementation().getFeature("LS", "3.0");
     LSSerializer lsSerializer = lsImpl.createLSSerializer();
     lsSerializer.getDomConfig().setParameter("xml-declaration", false);
     StringBuilder sb = new StringBuilder();
     sb.append(lsSerializer.writeToString(node));
     return sb.toString(); 
 }

答案 5 :(得分:1)

我遇到问题的最后一个答案是方法&#39; nodeToStream()&#39;未定义;因此,我的版本在这里:

    public static String toString(Node node){
    String xmlString = "";
    try {
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        //transformer.setOutputProperty(OutputKeys.INDENT, "yes");

        Source source = new DOMSource(node);

        StringWriter sw = new StringWriter();
        StreamResult result = new StreamResult(sw);

        transformer.transform(source, result);
        xmlString = sw.toString ();

    } catch (Exception ex) {
        ex.printStackTrace ();
    }

    return xmlString;
}

答案 6 :(得分:0)

到目前为止,最好的解决方案Andrey M's需要特定的实现方式,将来可能会引起问题。这是相同的方法,但是使用JDK可以执行的序列化(也就是说,配置要使用的东西)。

public static String innerXml(Node node) throws Exception
{
        StringWriter writer = new StringWriter();
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");

        NodeList childNodes = node.getFirstChild().getChildNodes();
        for (int i = 0; i < childNodes.getLength(); i++) {
            transformer.transform(new DOMSource(childNodes.item(i)), new StreamResult(writer));
        }
        return writer.toString();
}

如果要处理文档而不是节点,则必须深入一层,并使用node.getFirstChild().getChildNodes(); 但是使其更强大,您应该找到第一个Element,而不仅仅是认为只有一个节点。 XML必须具有单个根元素,但可以具有多个节点,包括注释,实体和空白文本。

        Node rootElement = docRootNode.getFirstChild();
        while (rootElement != null && rootElement.getNodeType() != Node.ELEMENT_NODE)
            rootElement = rootElement.getNextSibling();
        if (rootElement == null)
            throw new RuntimeException("No root element found in given document node.");

        NodeList childNodes = rootElement.getChildNodes();

如果我建议使用一个库来处理它,请尝试JSoup,它主要用于HTML,但works with XML too。我还没有测试过。

Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
fileContents.put(Attributes.BODY, document.body().html());
// versus: document.body().outerHtml()

答案 7 :(得分:0)

我想扩展Andrey M.的很好回答:

可能发生节点不可序列化的情况,这在某些实现中导致以下异常:

org.w3c.dom.ls.LSException: unable-to-serialize-node: 
            unable-to-serialize-node: The node could not be serialized.

我在Wildfly 13上运行“ org.apache.xml.serialize.DOMSerializerImpl.writeToString(DOMSerializerImpl)”实现时遇到了这个问题。

要解决此问题,我建议对Andrey M.的代码示例进行一些更改:

private static String innerXml(Node node) {
    DOMImplementationLS lsImpl = (DOMImplementationLS) node.getOwnerDocument().getImplementation().getFeature("LS", "3.0");
    LSSerializer lsSerializer = lsImpl.createLSSerializer();
    lsSerializer.getDomConfig().setParameter("xml-declaration", false); 
    NodeList childNodes = node.getChildNodes();
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < childNodes.getLength(); i++) {
        Node innerNode = childNodes.item(i);
        if (innerNode!=null) {
            if (innerNode.hasChildNodes()) {
                sb.append(lsSerializer.writeToString(innerNode));
            } else {
                sb.append(innerNode.getNodeValue());
            }
        }
    }
    return sb.toString();
}

我还添加了Nyerguds的评论。这在Wildfly 13中对我有效。

答案 8 :(得分:-1)

在Lukas Eder的解决方案之上,我们可以在.NET中提取innerXml,如下所示

    public static String innerXml(Node node,String tag){
            String xmlstring = toString(node);
            xmlstring = xmlstring.replaceFirst("<[/]?"+tag+">","");
            return xmlstring;       
}

public static String toString(Node node){       
    String xmlString = "";
    Transformer transformer;
    try {
        transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        //transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        StreamResult result = new StreamResult(new StringWriter());

        xmlString = nodeToStream(node, transformer, result);

    } catch (TransformerConfigurationException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (TransformerFactoryConfigurationError e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (TransformerException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }catch (Exception ex){
        ex.printStackTrace();
    }

    return xmlString;               
}

前:

If Node name points to xml with string representation "<Name><em>Chris</em>tian<em>Bale</em></Name>" 
String innerXml = innerXml(name,"Name"); //returns "<em>Chris</em>tian<em>Bale</em>"

答案 9 :(得分:-1)

以下是提取org.w3c.dom.Node内容的替代解决方案。 如果节点内容不包含xml标记,则此解决方案也适用:

private static String innerXml(Node node) throws TransformerFactoryConfigurationError, TransformerException {
    StringWriter writer = new StringWriter();
    String xml = null;
    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    transformer.transform(new DOMSource(node), new StreamResult(writer));
    // now remove the outer tag....
    xml = writer.toString();
    xml = xml.substring(xml.indexOf(">") + 1, xml.lastIndexOf("</"));
    return xml;
}