Question

我有一块格式化的XML格式：

<line n="2">
      <orig>of right hool herte <ex>&amp;</ex> in our<ex>e</ex><note place="bottom" anchored="true" xml:id="explanatory">Although “r” on the painted panels of the chapel is consistently written with an otiose mark when it concludes a word, the mark here is rendered more heavily and with a dot indicating suspension above the r. This rendering as “our<ex>e</ex>” is a linguistic outlier for the area based on the electronic <emph rend="italic">Linguistic Atlas of Late Medieval English</emph>’s linguistic profiles for “oure,” “our,” and “our<ex>e</ex>.” See eLALME's <ref target="http://archive.ling.ed.ac.uk/ihd/elalme_scripts/mapping/user-defined_maps.html">User Defined Maps</ref> for more information. Unfortunately the current online version (as of 12 July 2014) does not allow direct linking between static dotmaps and linguistic profiles.</note> best entent</orig>
</line>

我需要能够将它简化为明文：“正确的hool herte＆amp; in best best entent”，然后在空格上进行标记以获得逗号或标记分隔值的列表。我通过以下xslt完成了明文：

<xsl:template match="tei:line" >
        <xsl:apply-templates />   
</xsl:template>

<xsl:template match="orig">
    <xsl:apply-templates/>
</xsl:template>

<xsl:template match="ex">
    <xsl:apply-templates/>
</xsl:template>

<xsl:template match="note"/>

但是，我无法使用tokenize函数来处理apply-templates。如果我尝试使用value-of，则标签下方的标签将不再正常工作。有没有办法在xml上运行apply-templates，然后在单个xslt中标记每个元素？谢谢！

Answer 1

您无需tokenize()即可获得此输出：

  of right hool herte & in oure best entent

身份转换以及压制note的模板将为您完成：

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="note"/>

</xsl:stylesheet>

如果您希望它以逗号分隔，您可以在变量中捕获上述文本输出，然后如上所述应用tokenize：

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text"/>

  <xsl:variable name="result">
    <xsl:apply-templates/>
  </xsl:variable>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="note"/>

  <xsl:template match="/">
    <xsl:value-of select="string-join(tokenize(normalize-space($result), ' '), ',')"/>
  </xsl:template>

</xsl:stylesheet>

根据您的输入XML，上面的XSLT将产生以下文本：

of,right,hool,herte,&,in,oure,best,entent

Answer 2

我需要能够将它简化为明文：“正确的hool 赫特＆amp;在最好的情况下，“然后在空间上进行标记以获得一个逗号或标记分隔值的列表。

不确定“标记分隔值”是什么意思。鉴于以下测试输入：

<强> XML

<root>
    <line n="2">
          <orig>of right hool herte <ex>&amp;</ex> in our<ex>e</ex><note place="bottom" anchored="true" xml:id="explanatory">Although “r” on the painted panels of the chapel is consistently written with an otiose mark when it concludes a word, the mark here is rendered more heavily and with a dot indicating suspension above the r. This rendering as “our<ex>e</ex>” is a linguistic outlier for the area based on the electronic <emph rend="italic">Linguistic Atlas of Late Medieval English</emph>’s linguistic profiles for “oure,” “our,” and “our<ex>e</ex>.” See eLALME's <ref target="http://archive.ling.ed.ac.uk/ihd/elalme_scripts/mapping/user-defined_maps.html">User Defined Maps</ref> for more information. Unfortunately the current online version (as of 12 July 2014) does not allow direct linking between static dotmaps and linguistic profiles.</note> best entent</orig>
    </line>
</root>

以下样式表：

XSLT 2.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:template match="/root">
    <xsl:copy>
        <xsl:apply-templates select="line"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="line">
    <xsl:variable name="line-text">
        <xsl:apply-templates/>
    </xsl:variable>
    <xsl:copy>
        <xsl:copy-of select="@n"/>
        <xsl:value-of select="tokenize(normalize-space($line-text), ' ')" separator=", "/>
    </xsl:copy>
</xsl:template>

<xsl:template match="note"/>

</xsl:stylesheet>

将返回：

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <line n="2">of, right, hool, herte, &amp;, in, oure, best, entent</line>
</root>

将xslt tokenize函数应用于apply-templates的结果

2 个答案: