我的来源是:
<content>
<caption>text 1</caption>
<element1>Notepad is a basic text-editing program and it's most commonly used to view or edit text files. A text <bold>file</bold> is a <a>file</a> type typically identified by the .txt file name extension.</element1>
<section1>
<element2>Notepad is a basic text-editing program and it's most commonly used to view or edit text files. A text file is a file type typically identified by the .txt file name extension.</element2>
</section1>
</content>
我正在尝试为具有子(字符元素)和文本的元素(可能是任何元素)提取和创建唯一ID,以及仅包含文本的元素。 <{1}}和<bold>
元素不应分开。
<a>
任何想法都会受到高度赞赏......
答案 0 :(得分:0)
我不确定您是想保留层次结构还是要输出您所描述的那些元素的平面列表;以下简单地将所描述的元素提取为平面列表(尽管保留其内容),id
由XSLT处理器生成:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="*[not(*) and text()[normalize-space()]] | *[* and text()[normalize-space()]]">
<xsl:copy>
<xsl:attribute name="id" select="generate-id()"/>
<xsl:apply-templates select="@* , node()" mode="copy"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*" mode="copy">
<xsl:copy>
<xsl:apply-templates select="@* , node()" mode="#current"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
当应用于您的输入样本时,Saxon 9输出
<?xml version="1.0" encoding="UTF-8"?>
<caption id="d1e2">text 1</caption>
<element1 id="d1e4">Notepad is a basic text-editing program and it's most commonly used to view or edit text files. A text <bold>file</bold> is a <a>file</a> type typically identified by the .txt file name extension.</element1>
<element2 id="d1e13">Notepad is a basic text-editing program and it's most commonly used to view or edit text files. A text file is a file type typically identified by the .txt file name extension.</element2>