使用多个XPath表达式展平XML

时间:2009-12-26 18:02:07

标签: java xml xslt xpath

我正在寻找一种通用算法,可以将XML文件压缩成表格,给定多个XPath表达式,并且由于可用的XPath引擎实现的性质,我尝试过的所有事情都失败了。

给定XML:

<A Name="NameA">
<B Name="NameB1">
    <C Name="NameC1"/>
    <C Name="NameC2"/>
    <C Name="NameC3"/>
</B>
<B Name="NameB2">
    <C Name="NameC4"/>
    <C Name="NameC5"/>
    <C Name="NameC6"/>
</B>

以及以下XPath表达式作为输入:

/A/@Name
/A/B/@Name
/A/B/C/@Name

输出应该是以下形式的表格:

NameA NameB1 NameC1

NameA NameB1 NameC2

NameA NameB1 NameC3

NameA NameB2 NameC4

NameA NameB2 NameC5

NameA NameB2 NameC6

我正在尝试使用可用的Java XML包(如javax.xml.xpath,jdom等)访问此表。但无济于事。

好像是

XPath.evaluate("/A/B/C/@Name", doc, XPathConstants.NODESET);

代码将返回一个无法遍历的“分离”节点。

我在XPath评估的节点上尝试了很多递归方式但无济于事。还想到了DOM树的DFS遍历,但是所有XPath评估器似乎都返回了分离的节点,其中node.getParent()将始终返回'null'。

有关“多XPath表达式感知”算法的任何想法,可以跟踪嵌套的XPath表达式吗?

我觉得使用XSLT可以轻松实现这一点,但我的XSLT技能非常生疏......

3 个答案:

答案 0 :(得分:3)

这个XSLT:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output indent="yes" />

    <xsl:template match="/">
    <table>
<!--Based upon your comments, it sounds as if you don't know what the structure of the XML you will be dealing with is(element nesting or attribute names).
        That makes it a little bit difficult.    
        Based upon the example XML you gave the following for-each will work:-->
        <xsl:for-each select="//C"> <!--You could also use "/A/B/C" -->
        <tr>
<!--This looks up the node tree and creates a column for the current element, as well as for each of it's parents, using the first Attribute as the value.-->
            <xsl:for-each select="ancestor-or-self::*">
            <td><xsl:value-of select="@*[1]"/></td>
            </xsl:for-each>
        </tr>
        </xsl:for-each>
    </table>
    </xsl:template>

</xsl:stylesheet>

适用于所提供的XML并生成以下内容:

<?xml version="1.0" encoding="UTF-16"?>
<table>
<tr>
<td>NameA</td>
<td>NameB1</td>
<td>NameC1</td>
</tr>
<tr>
<td>NameA</td>
<td>NameB1</td>
<td>NameC2</td>
</tr>
<tr>
<td>NameA</td>
<td>NameB1</td>
<td>NameC3</td>
</tr>
<tr>
<td>NameA</td>
<td>NameB2</td>
<td>NameC4</td>
</tr>
<tr>
<td>NameA</td>
<td>NameB2</td>
<td>NameC5</td>
</tr>
<tr>
<td>NameA</td>
<td>NameB2</td>
<td>NameC6</td>
</tr>
</table>

答案 1 :(得分:0)

编辑同样的事情,但使用XPath:

        XPathFactory f = XPathFactory.newInstance();
        XPath xPath = f.newXPath();
        NodeList list = (NodeList) xPath.evaluate("//*[* and not(*/*)]/*", new InputSource(stream), XPathConstants.NODESET);

        for (int i = 0; i < list.getLength(); i++) {
            Node n = list.item(i);
            Stack<Node> s = new Stack<Node>();

            while (n != null) {
                s.push(n);
                n = n.getParentNode();
            }

            s.pop(); //this is document root, we don't need it

            while (s.size() > 0) {
                NamedNodeMap map = s.pop().getAttributes();

                for (int j = 0; j < map.getLength(); j++) {
                    Node node = map.item(j);
                    System.out.print(node.getNodeName() + ": " + node.getTextContent() + " ");
                }
            }

            System.out.println("");
        } 

您可以使用常规DOM功能。它不如XPath好,但是通用的,可以使用任何XML文件。

如果我理解你的话,那么这段代码就可以解决问题了:

    String xml = "<A Name=\"NameA\">\n" +
            "<B Name=\"NameB1\">\n" +
            "        <C Name=\"NameC1\"> </C>\n" +
            "        <C Name=\"NameC2\"/>\n" +
            "        <C Name=\"NameC3\"/>\n" +
            "</B>\n" +
            "<B Name=\"NameB2\">\n" +
            "        <C Name=\"NameC4\"/>\n" +
            "        <C Name=\"NameC5\"/>\n" +
            "        <C Name=\"NameC6\"/>\n" +
            "</B></A>";
    try {
        DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        Document doc = builder.parse(new ByteArrayInputStream(xml.getBytes()));

        Queue<Node> q = new LinkedList<Node>();

        q.add(doc.getFirstChild());
        //start BFS
        while (q.size() > 0) {
            Node n = q.poll();
            NodeList childNodes = n.getChildNodes();
            //add all children of current node
            int elemNodes = 0;
            for (int i = 0; i < childNodes.getLength(); i++) {
                Node node = childNodes.item(i);
                if (node.getNodeType() == Node.ELEMENT_NODE) {
                    elemNodes++;
                    q.add(node);
                }
            }
            //if node has no children, print its path
            if (elemNodes == 0) {
                Stack<Node> s = new Stack<Node>();

                while (n != null) {
                    s.push(n);
                    n = n.getParentNode();
                }

                s.pop(); //this is document root, we don't need it

                while (s.size() > 0)
                    System.out.print(s.pop().getAttributes().getNamedItem("Name").getTextContent() + " ");

                System.out.println("");
            }
        }
    } catch (ParserConfigurationException e) {
        e.printStackTrace();
    } catch (SAXException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

答案 2 :(得分:0)

我希望你能用XSLT2做到这一点。 (如果你只限于XSLT1那么我不确定)。 有关教程,请参阅http://www.xml.com/pub/a/2003/11/05/tr.html。您可以拥有多个分组指令,它们都可以使用XPath。我无法立即为您的问题提供代码,但如果您阅读本教程,我认为它可以很好地映射。