XSL用于抓取文件列表并从特定元素编译定义列表

时间:2012-11-19 21:51:05

标签: html xslt xpath dita

我有一组文件包含我想要编译成单个列表的定义。文件列表存储在一个看起来像这样的XML文件中(这是输入文件):

 <report>
 <incident>
 <file>Balance_fields_selected.htm</file>
 </incident>
 <incident>
 <file>Cd_fields.htm</file>
 </incident>
 </report>

<file>元素指定的每个文件都包含一系列我需要编译成单个列表的<p class='Term'>个元素。其中每一个都跟着一些需要分组的任意数量的其他元素(我正在尝试使用密钥):

 <html><body>
 <p class="Term">
<a name="Accrued_Bonus_Interest" id="Accrued_Bonus_Interest"></a>Accrued (Bonus Interest)</p>
<p>Bonus Interest Accrued Cycle-to-Date. &#160;Amount of bonus interest that has accrued on the time deposit.</p>
<p>Pages: &#160;View CD Detail.</p>
<p class="Term">
<a name="Accrued_OID" id="Accrued_OID"></a>Accrued (Original Issue Discount)</p>
<p>Original Issue Discount Interest Accrued Year-to-Date. &#160;OID interest accrued in the current year.</p>
<p>Pages: &#160;View CD Detail.</p> 
 </body></html>

所需的结果如下所示:

 <topic>
 <title>Arbitrary title</title>
 <body>
 <dl>
 <dlentry id="Accrued_Bonus_Interest">
 <dt>Accrued (Bonus Interest)</dt>
 <dd><p>Bonus Interest Accrued Cycle-to-Date. &#160;Amount of bonus interest that has accrued on the time deposit.</p>
 <p>Pages: View CD Detail</p></dd></dlentry>
 <dlentry id="Accrued_OID">
 <dt>Accrued (Original Issue Amount)</dt>
 <dd><p>Original Issue Discount Interest Accrued Year-to-Date. &#160;OID interest accrued in the current year.</p>
 <p>Pages: View CD Detail</p></dd>
 </dl>
 </body>
 </topic>

我有一个样式表已经完成了大部分工作 - 看起来我只是(再次)丢失了正确使用键。以下样式表:

    <?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:strip-space elements="*" />

<xsl:key name="kFollowing" match="*[not(p[@class='Term'])]" 
    use="generate-id(preceding::p[@class='Term'][1])"/>

<xsl:template match="/">
    <![CDATA[ 
    <!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">  ]]>
    <topic id="data_dictionary">
        <title>IBS Insight Data Dictionary</title>
        <body>
    <dl>
        <xsl:for-each select="/report/incident/file">
            <xsl:for-each select="document(.)/descendant::p[@class='Term']">
                <xsl:variable name="vFollowing" select="key('kFollowing', generate-id())"/> 
                <xsl:element name="dlentry">
                    <xsl:attribute name="id">
                        <xsl:value-of select="child::a/@id"/>
                    </xsl:attribute>
                    <dt><xsl:value-of select="."/></dt>
                    <dd><xsl:value-of select="$vFollowing"/>
                        <xsl:for-each select="following::*[$vFollowing]">
                            <xsl:apply-templates select="."/>
                        </xsl:for-each>
                    </dd>
                </xsl:element>
            </xsl:for-each>
      </xsl:for-each>                
    </dl></body></topic>
</xsl:template>

</xsl:stylesheet>

正确抓取输入文件中定义的文件,并生成具有正确dl ID和dlentry元素的dt。问题是我混淆密钥的实现方式是组装dd元素。我的示例xml在这里并不明显,但正在发生的是每个<p class='Term'>正在抓取所有后续的contnet以填充其<dd>,如下所示:

     <topic>
 <title>Arbitrary title</title>
 <body>
 <dl>
 <dlentry id="Accrued_Bonus_Interest">
 <dt>Accrued (Bonus Interest)</dt>
 <dd>Bonus Interest Accrued Cycle-to-Date. &#160;Amount of bonus interest that has accrued on the time deposit. Pages: View CD DetailAccrued (Original Issue Amount)Original Issue Discount Interest Accrued Year-to-Date. &#160;OID interest accrued in the current year.Pages: View CD Detail</dd></dlentry>
 <dlentry id="Accrued_OID">
 <dt>Accrued (Original Issue Amount)</dt>
 <dd><p>Original Issue Discount Interest Accrued Year-to-Date. &#160;OID interest accrued in the current year.</p>
 <p>Pages: View CD Detail</p></dd>
 </dl>
 </body>
 </topic>

每个文件中的最后一项都是正确呈现的,但这只是因为没有其他节点需要处理。关于我的代码的一些东西是用键来匹配太多的节点。

感谢您的光临。

1 个答案:

答案 0 :(得分:2)

我会略微改变键定义:

<xsl:key name="kFollowing"
    match="*[not(self::p[@class='Term'])][preceding-sibling::p[@class='Term']" 
    use="generate-id(preceding-sibling::p[@class='Term'][1])"/>

这匹配任何本身不是<p class="Term">但在树中与此元素处于同一级别的元素,并将它们按最近的前一个&#34; Term&#34;进行分组。如果您想允许<p class="Term">之后的内容只是文本节点(即不在任何元素内)的情况,那么您需要

<xsl:key name="kFollowing"
    match="node()[not(self::p[@class='Term'])][preceding-sibling::p[@class='Term']" 
    use="generate-id(preceding-sibling::p[@class='Term'][1])"/>

然后你可以简化内部for-each到

        <xsl:for-each select="document(.)/descendant::p[@class='Term']">
            <dlentry id="{a/@id}">
                <dt><xsl:value-of select="."/></dt>
                <dd><xsl:copy-of select="key('kFollowing', generate-id())"/></dd>
            </dlentry>
        </xsl:for-each>