我有一块格式化的XML格式:
<line n="2">
<orig>of right hool herte <ex>&</ex> in our<ex>e</ex><note place="bottom" anchored="true" xml:id="explanatory">Although “r” on the painted panels of the chapel is consistently written with an otiose mark when it concludes a word, the mark here is rendered more heavily and with a dot indicating suspension above the r. This rendering as “our<ex>e</ex>” is a linguistic outlier for the area based on the electronic <emph rend="italic">Linguistic Atlas of Late Medieval English</emph>’s linguistic profiles for “oure,” “our,” and “our<ex>e</ex>.” See eLALME's <ref target="http://archive.ling.ed.ac.uk/ihd/elalme_scripts/mapping/user-defined_maps.html">User Defined Maps</ref> for more information. Unfortunately the current online version (as of 12 July 2014) does not allow direct linking between static dotmaps and linguistic profiles.</note> best entent</orig>
</line>
我需要能够将它简化为明文:“正确的hool herte&amp; in best best entent”,然后在空格上进行标记以获得逗号或标记分隔值的列表。我通过以下xslt完成了明文:
<xsl:template match="tei:line" >
<xsl:apply-templates />
</xsl:template>
<xsl:template match="orig">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="ex">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="note"/>
但是,我无法使用tokenize函数来处理apply-templates。如果我尝试使用value-of,则标签下方的标签将不再正常工作。有没有办法在xml上运行apply-templates,然后在单个xslt中标记每个元素?谢谢!
答案 0 :(得分:0)
您无需tokenize()
即可获得此输出:
of right hool herte & in oure best entent
身份转换以及压制note
的模板将为您完成:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="note"/>
</xsl:stylesheet>
如果您希望它以逗号分隔,您可以在变量中捕获上述文本输出,然后如上所述应用tokenize
:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:variable name="result">
<xsl:apply-templates/>
</xsl:variable>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="note"/>
<xsl:template match="/">
<xsl:value-of select="string-join(tokenize(normalize-space($result), ' '), ',')"/>
</xsl:template>
</xsl:stylesheet>
根据您的输入XML,上面的XSLT将产生以下文本:
of,right,hool,herte,&,in,oure,best,entent
答案 1 :(得分:0)
我需要能够将它简化为明文:“正确的hool 赫特&amp;在最好的情况下,“然后在空间上进行标记以获得一个 逗号或标记分隔值的列表。
不确定“标记分隔值”是什么意思。鉴于以下测试输入:
<强> XML 强>
<root>
<line n="2">
<orig>of right hool herte <ex>&</ex> in our<ex>e</ex><note place="bottom" anchored="true" xml:id="explanatory">Although “r” on the painted panels of the chapel is consistently written with an otiose mark when it concludes a word, the mark here is rendered more heavily and with a dot indicating suspension above the r. This rendering as “our<ex>e</ex>” is a linguistic outlier for the area based on the electronic <emph rend="italic">Linguistic Atlas of Late Medieval English</emph>’s linguistic profiles for “oure,” “our,” and “our<ex>e</ex>.” See eLALME's <ref target="http://archive.ling.ed.ac.uk/ihd/elalme_scripts/mapping/user-defined_maps.html">User Defined Maps</ref> for more information. Unfortunately the current online version (as of 12 July 2014) does not allow direct linking between static dotmaps and linguistic profiles.</note> best entent</orig>
</line>
</root>
以下样式表:
XSLT 2.0
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/root">
<xsl:copy>
<xsl:apply-templates select="line"/>
</xsl:copy>
</xsl:template>
<xsl:template match="line">
<xsl:variable name="line-text">
<xsl:apply-templates/>
</xsl:variable>
<xsl:copy>
<xsl:copy-of select="@n"/>
<xsl:value-of select="tokenize(normalize-space($line-text), ' ')" separator=", "/>
</xsl:copy>
</xsl:template>
<xsl:template match="note"/>
</xsl:stylesheet>
将返回:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<line n="2">of, right, hool, herte, &, in, oure, best, entent</line>
</root>