Question

我和XML

<main>
    <div type='scene'>
        <l>l1</l>
        <sp>A speach</sp>
        <l>l2</l>
        <pb />
        <l>l3</l>
       <l>14</l>
    </div>
</main>

我的任务是将其转换为

<div class="line-group">
    <l>l1</l>
    <div class="speach">
       A speach
    </div>
    <l>l2</l>
</div>
<div class="line-group">
    <l>l3</l>
    <l>l4</l>
</div>

我理解可能有<pb />的任意数量，如果没有连续的<pb />并且在开始时没有<pb />，那么此输出只能正确获得结束。

但是，我们可以使用此方法将所有<pb />替换为</div><div class="line-group">，并在开始时设置<div class="line-group">并在结束时设置</div>。

如何在XSLT中执行此操作？

我有所有其他标签的模板，我在示例中使用sp来显示<l>不是唯一的子项。

Answer 1

您可以定义键，以便根据最近的pb元素将每个场景中的非pb元素收集到组中。

<xsl:key name="elByPb" match="*[not(self::pb)]"
                       use="concat(generate-id(..), '|',
                                   generate-id(preceding-sibling::pb[1]))" />

现在处理一个场景，为第一个line-group之前的元素创建pb，然后为每个pb之后的元素创建另一个：

<xsl:template match="div[@type='scene']">
  <xsl:copy>
    <xsl:copy-of select="@*" />
    <xsl:call-template name="line-group">
      <xsl:with-param name="groupKey" select="concat(generate-id(), '|')" />
    </xsl:call-template>
    <xsl:apply-templates select="pb" />
  </xsl:copy>
</xsl:template>

<xsl:template match="pb" name="line-group">
  <xsl:param name="groupKey"
             select="concat(generate-id(..), '|', generate-id())" />
  <div class="line-group">
    <xsl:apply-templates select="key('elByPb', $groupKey)" />
  </div>
</xsl:template>

在这里，我利用了空节点集的generate-id是（根据定义）空字符串的事实，因此节中第一个pb之前的元素将关注"id-of-parent|"

Answer 2

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="div">
<xsl:for-each-group select="*" group-ending-with="pb">
<div class="line-group">
 <xsl:apply-templates select="current-group()"/>
 </div>
</xsl:for-each-group>
</xsl:template>
<xsl:template match="pb"></xsl:template>
<xsl:template match="l">

     <xsl:copy>
          <xsl:apply-templates select="@* | node()" />
     </xsl:copy>
</xsl:template>

Answer 3

禁用输出逃逸=＆＃39;是＆＃34;可以使用，但它不是一种优雅的方法。

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="div">
<root>
<xsl:text disable-output-escaping="yes">&lt;div class=&quot;line-group&quot;></xsl:text>
          <xsl:apply-templates/>
<xsl:text disable-output-escaping="yes">&lt;/div></xsl:text>  
</root>
</xsl:template> 

<xsl:template match="pb">
  <xsl:text disable-output-escaping="yes">&lt;/div></xsl:text>  <xsl:text disable-output-escaping="yes">&lt;div class=&quot;line-group&quot;></xsl:text>
</xsl:template>
<xsl:template match="l">
     <xsl:copy>
          <xsl:apply-templates select="@* | node()" />
     </xsl:copy>   
</xsl:template>

Answer 4

初步估计，我认为你有两种选择。

您可以将模板应用于每个pb元素，并且在给定pb的模板中，您可以使用所需的属性生成所需的div，然后以深度优先的顺序遍历树以处理pb和下一个之间的节点。如果您能够在传统样式表中大量使用输入词汇表中的任何模板，那将会让我感到惊讶，因为它们中的任何一个都不会以对可能出现的一个或多个pb元素敏感的方式编写。正在处理的元素的中间。

正确处理文本中出现的pb元素会带来一些挑战。考虑一下哈姆雷特的这一点：
```
<div type="scene" n="III.ii">
  ...
  <sp><speaker>King.</speaker>
    <p>What do you call the play?</p>
  </sp>
  <sp><speaker>Hamlet.</speaker>
    <p><title>The Mousetrap</title>.  Marry, how? 
    <seg xml:id="III.ii.243">Tropically</seg>.
    This play is the image of a murder done
    in Vienna:  Gonzago is the Duke's name;
    his wife, Baptista.  You shall see anon.
    'Tis a knavish piece of work, but what of
    that?  Your Majesty, and we that have 
    <seg xml:id="III.ii.247">free</seg>
    <pb n="107"/>
    souls, it touches us not.  Let the 
    <seg xml:id="III.ii.248">galled jade 
    winch</seg>; our withers are unwrung
    </p>
  </sp>
  ...
</div>
```
（我省略了注释，但留下了标记其附加文本的seg元素。）第107页div的开头应该是什么样的？它是从PCDATA开始的吗？
```
<div n="pb107">
souls, it touches us not.
```
或者需要拆分封闭的p元素，因此第107页开始
```
<div n="pb107">
<p part="F">souls, it touches us not.
```
从您的问题中不清楚您的输出是否应该是TEI词汇表中的新文档，具有不同的结构（在这种情况下，类属性是意外的）或HTML（在这种情况下l元素是意外的） ;由于TEI div不将文本节点作为子节点，因此如果您正在编写TEI-to-TEI变换，则第一个选项不可用。
您可以采用Ian Roberts概述的极其聪明的方法。它需要以两种方式进行一般化推广。

首先，密钥几乎肯定不需要基于最近的兄弟pb（preceding-sibling::pb[1]），而是基于最近的任何深度的pb（preceding::pb[1]）。

其次，您需要删除两个pb元素之间的元素也将在两个pb元素之间结束的假设。也就是说，你必须准备好在场景中的两个演讲之间发生一个pb，然后在演讲中发生下一个pb。在上面引用的片段中，哈姆雷特的讲话在<pb n="106"/>之后开始，但它不能全部出现在第106页的div中：它的结尾（以“灵魂”开头）必须出现在第107页的div中。处理此问题的方法可能是每次调用apply-templates时将有问题的pb元素（或其生成的ID）作为参数传递。然后，在每个模板中，测试作为参数传入的pb是否与./preceding::pb[1]相同。如果是这样，继续;如果没有，什么也不做。

我不会尝试完整的解决方案，因为您的要求对我来说不够清楚。祝你好运！

XSLT将线性结构转换为非线性结构

4 个答案: