棘手的XSLT递归树转换

时间:2009-07-10 15:05:08

标签: xslt

(注意:如果你在XSLT之外的函数式编程语言中做得很好,这个问题可能仍然适合你 - 不一定是XSLT对我来说这很棘手。)

我将如何改变这一点:

<!-- note that empty cells are implicit and therefore nonexistent -->
<tables>
  <table pos="1">
    <row pos="1">
      <cell pos="1">Category A</cell>
      <cell pos="10">Category B</cell>
    </row>
    <row pos="2">
      <cell pos="1">Sub-Category 1</cell>
      <cell pos="4">Sub-Category 2</cell>
      <cell pos="6">Sub-Category 3</cell>
      <cell pos="10">Sub-Category 1</cell>
    </row>
  </table>
  <table pos="2">
    <row pos="1">
      <cell pos="1">Category A</cell>
      <cell pos="11">Category B</cell>
    </row>
    <row pos="2">
      <cell pos="1">Sub-Category 1</cell>
      <cell pos="2">Sub-Category 2</cell>
      <cell pos="4">Sub-Category 3</cell>
      <cell pos="10">Sub-Category 4</cell>
      <cell pos="11">Sub-Category 1</cell>
      <cell pos="12">Sub-Category 2</cell>
    </row>
  </table>
</tables>

对此:

<tree>
  <node label="Category A">
    <positions>
      <pos table="1" row="1" cell="1" />
      <pos table="2" row="1" cell="1" />
    </positions>
    <node label="Sub-Category 1">
      <positions>
        <pos table="1" row="2" cell="1" />
        <pos table="2" row="2" cell="1" />
      </positions>
    </node>
    <node label="Sub-Category 2">
      <positions>
        <pos table="1" row="2" cell="4" />
        <pos table="2" row="2" cell="2" />
      </positions>
    </node>
    <node label="Sub-Category 3">
      <positions>
        <pos table="1" row="2" cell="6" />
        <pos table="2" row="2" cell="4" />
      </positions>
    </node>
    <node label="Sub-Category 4">
      <positions>
        <pos table="2" row="2" cell="10" />
      </positions>
    </node>
  </node>
  <node label="Category B">
    <positions>
      <pos table="1" row="1" cell="10" />
      <pos table="2" row="1" cell="11" />
    </positions>
    <node label="Sub-Category 1">
      <positions>
        <pos table="1" row="2" cell="10" />
        <pos table="2" row="2" cell="11" />
      </positions>
    </node>
    <node label="Sub-Category 2">
      <positions>
        <pos table="2" row="2" cell="12" />
      </positions>
    </node>
  </node>
</tree>

请注意,每个表都代表一个任意深度的2D类别树。每行代表前一行的子节点,其中子类别通过它们各自的位置附加到父节点 - 如果它们的位置是&gt; =左边的父节点并且&lt;右边的下一个父(如果有一个)。

我希望输出为树,按标签分组(仅在描述的父子关系中,但通过表格)。例如,“子类别1”分别存在于“类别A”和“类别B”中。

有n行,每行有n个单元。您可以将输入数据结构想象为3D立方体,每个表表示不同年份的大致相同的数据。

如果它只是一个表(“年”),我可以做上面的事情,但是对于n个表我找到一种应用这个东西的方法有很大的问题,因为其他方面父母的位置不同。

最后,我对扩展函数免费的XSLT 1.0解决方案感兴趣,尽管任何(算法)帮助都非常感谢。这个问题现在困扰着我很长一段时间,我似乎无法绕过它。我觉得必须有一个干净的解决方案,我只是看不到它。我相信这可以通过一个递归模板,几个键和一些非常聪明的XPath来完成。

我想这个问题对于风滚草徽章来说很重要。 : - )

2 个答案:

答案 0 :(得分:1)

因此,每行中的单元实际上是前一行中的单元的子节点,其中子单元的“pos”值是> =父单元的“pos”值并且&lt;前一行的后续父(如果有的话)的“pos”值?

如果您非常喜欢使用XSL,那么在 follow-sibling 轴上take a look。然而,这将是一个混乱的选择路径。可能更容易做两个转换:一个将平面数据组织成分层格式,另一个产生最终输出。

就个人而言,我认为我会编写一个显式处理DOM树的程序,特别是如果这是一次性操作。我怀疑它的写入速度一样快,并且在选择逻辑中没有机会出现奇怪的角落情况。

答案 1 :(得分:1)

这是我第一次尝试这个。它看起来很容易,直到我遇到了递归分组的麻烦。我想过使用节点集,但我认为这是XSL 1.0的扩展,对吧?那么,没有节点设置呢?缺乏这一点,它是这样的:

  1. 查找所有根节点,以不同方式处理它们(它们没有父节点,因此适用不同的规则)。

  2. 对于每个此类节点,爬网每个//单元节点并测试它是否可能是当前节点的直接子节点。如果它可能是一个孩子,我们仍然需要对它进行分组,所以......

  3. 沿前一轴爬网每个//单元节点。对于每个此类前面的//单元节点,请查看它是否也可能是步骤2中找到的父节点的直接子节点。

  4. 在找到作为子节点的前面的//单元节点后,将前一个子节点的标签与在步骤2中找到的子节点进行比较。如果它们相等,则不输出任何内容(因为此标签是输出更早)。否则...

  5. 开始输出子节点。再次爬过每个//单元节点,再找到此标签的所有位置,在该位置中,它是步骤2中找到的父节点的子节点。

  6. 递归:返回步骤2,将此子节点用作新的父节点。

  7. 在MSXML2上测试,看起来它正在生成您要求的内容。它似乎按照我的理解工作,即使我添加更多&lt; row&gt;它也应该有效。元素。编写它当然很有趣,它让我思考了在使用XSL时我通常不会想到的方向。我想这很好......?然后,也许这是一个可以用XSL解决的问题的例子,但可以通过其他方式更容易解决。

    <?xml version="1.0" encoding="utf-8"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:output method="xml" indent="yes"/>
    
        <xsl:key name="roots" match="/tables/table/row[1]/cell" use="text()" />
        <xsl:key name="cell-by-label" match="/tables/table/row/cell" use="text()"/>
        <xsl:variable name="cells" select="/tables/table/row/cell" />
    
        <xsl:template match="/tables">
    
            <xsl:variable name="rootCells" select="/tables/table/row[1]/cell"/>
    
            <tree>
                <!-- Work on each root-level nodes first. -->
                <xsl:apply-templates
                    mode="root"
                    select="$rootCells[count(.|key('roots', text())[1]) = 1]" />
            </tree>
        </xsl:template>
    
        <!--
        Root level cells are handled differently. They have no available $parent,
        for one thing. Because of that, it seems simpler to handle them as an
        exception here, rather than peppering the mode="crawl" with exceptions
        for parentless nodes.
        -->
        <xsl:template match="cell" mode="root">
            <node label="{.}">
    
                <!--
                Get a list of everywhere that this cell is found.
                We are looking for only other root-level cells here.
                -->
                <positions>
                    <xsl:for-each select="key('roots', text())">
                        <pos table="{../../@pos}" row="{../@pos}" cell="{@pos}"/>
                    </xsl:for-each>
                </positions>
    
                <!--
                Locate all child nodes, across all tables in which this node is found.
                A node is a child node if:
                1. It is in a row directly following the row in which this cell is found.
                2. If the @pos is >= the @pos of the parent
                3. If the @pos is < the @pos of the parent to the right (if there is a parent to the right)
                   Note: Meeting the above conditions is not difficult; it's grouping at this
                         point that gives trouble. If the problem permitted extension functions
                         to XSL 1.0, then perhaps we could generate a node-set and group that.
                         I've not tried that way, since I'm under the impression that a node-set
                         would be an extension to 1.0, and therefore, "cheating."
                         However, if we could generate a node-set and group based on that,
                         then the following block selects the correct nodes (I think):
    
                <xsl:for-each select="$matches">
                    <xsl:variable name="childRow" select="../following-sibling::row[1]"/>
                    <xsl:variable name="Lparent" select="@pos"/>
                    <xsl:variable name="Rparent" select="following-sibling::cell[1]/@pos"/>
                    <xsl:choose>
                    <xsl:when test="$Rparent">
                        <xsl:apply-templates
                            select="$childRow/cell[
                                    @pos &gt;= $Lparent
                                and @pos &lt; $Rparent]"
                            mode="child" />
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:apply-templates
                            select="$childRow/cell
                                [@pos &gt;= $Lparent]"
                            mode="child"/>
                    </xsl:otherwise>
                    </xsl:choose>
                </xsl:for-each>
    
                Without using a node-set, I'll try to solve this by crawling over every
                table/row/cell and test each one, every time.  Not pretty by a long shot.
                But hey, memory and processors are getting cheaper every day.
                -->
    
                <xsl:apply-templates select="$cells" mode="crawl">
                <xsl:with-param name="parent" select="."/>
                <xsl:with-param name="task">Group children by their labels</xsl:with-param>
                </xsl:apply-templates>
    
            </node>
        </xsl:template>
    
        <xsl:template match="cell" mode="child">
        <xsl:param name="parent"/>
    
            <node label="{.}">
    
                <positions>
                    <xsl:apply-templates mode="crawl"
                        select="key('cell-by-label', text())">
                    <xsl:with-param name="parent" select="$parent"/>
                    <xsl:with-param name="child"  select="."/>
                    <xsl:with-param name="task">Find all positions of a child</xsl:with-param>
                    </xsl:apply-templates>
                </positions>
    
                <!-- And.... Recursion Start Now! -->
                <xsl:apply-templates select="$cells" mode="crawl">
                <xsl:with-param name="parent" select="."/>
                <xsl:with-param name="task">Group children by their labels</xsl:with-param>
                </xsl:apply-templates>
    
            </node>
    
        </xsl:template>
    
        <xsl:template match="cell" mode="crawl">
        <xsl:param name="parent"/>
        <xsl:param name="child"/>
        <xsl:param name="task"/>
    
            <xsl:variable name="parentRow"
                select="generate-id(../preceding-sibling::row[1])"/>
    
            <xsl:variable name="parentCell"
                select="key('cell-by-label', $parent/text())
                    [$parentRow = generate-id(..)]" />
    
            <xsl:variable name="RparentPos"
                select="$parentCell/following-sibling::cell[1]/@pos"/>
    
            <!--
            This cell is a child if it is in a row directly following a row
            in which the parent cell's text value made an appearance.
            <xsl:if test="$parentCell">
    
            This cell is a child if it's @pos is >= the parent cell's pos
            <xsl:if test="@pos &gt;= $parentCell/@pos">
    
            If there is a parent cell to the right of this cell's parent cell,
            this this cell is a child only if its @pos is < the right-parent
            cell's @pos.
            <xsl:if test="not($RparentPos) or @pos &lt; $RparentPos">
            -->
    
            <xsl:if test="
                $parentCell
                and (@pos &gt;= $parentCell/@pos)
                and (not($RparentPos) or @pos &lt; $RparentPos)">
    
                <xsl:choose>
    
                <!--
                If our task is to determine whether there are any nodes prior to
                the given child node in the document order which are also
                children of the parent and which have the same label value, we do
                that now. All we really want is to make a mark. We will later use
                string-length to see if we made any marks here or not.
                -->
                <xsl:when test="$task = 'Are there prior children with equal labels?'">
                    Yes
                </xsl:when>
    
                <!--
                Here, our task is to generate the <pos> nodes of the children.
                -->
                <xsl:when test="$task = 'Find all positions of a child'">
                    <pos table="{../../@pos}" row="{../@pos}" cell="{@pos}"/>
                </xsl:when>
    
                <!--
                If our task is to group children by their labels, we need to know
                if this is the first child node with this particular label. To do
                that, we crawl over all cells along the preceding axis, and if
                they are otherwise potential children (see above block), then a
                mark is made (perhaps several such marks, doesn't matter how many,
                really). If we have any marks when we are done, we know we have
                output this label before, so we don't do it again.
                -->
                <xsl:when test="$task = 'Group children by their labels'">
                    <xsl:variable name="priorMatches">
                        <xsl:apply-templates mode="crawl"
                            select="preceding::cell[text() = current()/text()]">
                        <xsl:with-param name="parent" select="$parent"/>
                        <xsl:with-param name="task">Are there prior children with equal labels?</xsl:with-param>
                        </xsl:apply-templates>
                    </xsl:variable>
    
                    <xsl:if test="string-length($priorMatches) = 0">
                        <xsl:apply-templates select="." mode="child">
                        <xsl:with-param name="parent" select="$parent"/>
                        </xsl:apply-templates>
                    </xsl:if>
                </xsl:when>
    
                </xsl:choose>
            </xsl:if>
    
        </xsl:template>
    </xsl:stylesheet>
    

    编辑1:稍微重新格式化。添加了一些xsl:key元素的使用,以帮助按标签查找子节点。做了一些其他优化 - 希望不会降低可读性。

    思考/评论?