我想使用XSLT选择一个摘要以及HTML格式元素。以下是XML的示例:
<PUBLDES>The <IT>European Journal of Cancer (including EJC Supplements),</IT>
is an international comprehensive oncology journal that publishes original
research, editorial comments, review articles and news on experimental oncology,
clinical oncology (medical, paediatric, radiation, surgical), translational
oncology, and on cancer epidemiology and prevention. The Journal now has online
submission for authors. Please submit manuscripts at
<SURL>http://ees.elsevier.com/ejc</SURL> and follow the instructions on the
site.<P/>
The <IT>European Journal of Cancer (including EJC Supplements)</IT> is the
official Journal of the European Organisation for Research and Treatment
of Cancer (EORTC), the European CanCer Organisation (ECCO), the European
Association for Cancer Research (EACR), the the European Society of Breast
Cancer Specialists (EUSOMA) and the European School of Oncology (ESO). <P/>
Supplements to the <IT>European Journal of Cancer</IT> are published under
the title <IT>EJC Supplements</IT> (ISSN 1359-6349). All subscribers to
<IT>European Journal of Cancer</IT> automatically receive this publication.<P/>
To access the latest tables of contents, abstracts and full-text articles
from <IT>EJC</IT>, including Articles-in-Press, please visit <URL>
<HREF>http://www.sciencedirect.com/science/journal/09598049</HREF>
<HTXT>ScienceDirect</HTXT>
</URL>.</PUBLDES>
如何从中获取45个单词以及其中的HTML标记。当我使用substring()
或concat()
时,会删除标记(例如<IT>
等)。
答案 0 :(得分:4)
以编程方式执行此操作可能会更好,而不是使用纯XSLT,但如果必须使用XSLT,则可以采用一种方法。它确实涉及多个样式表,但如果你能够使用扩展函数,你可以使用节点集,并将它们组合成一个大的(和讨厌的)样式表。
第一个样式表将复制初始XML,但“标记”它找到的任何文本,以便文本中的每个单词成为单独的“WORD”元素。
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- Copy existing nodes and attributes -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Match text nodes -->
<xsl:template match="text()">
<xsl:call-template name="tokenize">
<xsl:with-param name="string" select="."/>
</xsl:call-template>
</xsl:template>
<!-- Splits a string into separate elements for each word -->
<xsl:template name="tokenize">
<xsl:param name="string"/>
<xsl:param name="delimiter" select="' '"/>
<xsl:choose>
<xsl:when test="$delimiter and contains($string, $delimiter)">
<xsl:variable name="word" select="normalize-space(substring-before($string, $delimiter))"/>
<xsl:if test="string-length($word) > 0">
<WORD>
<xsl:value-of select="$word"/>
</WORD>
</xsl:if>
<xsl:call-template name="tokenize">
<xsl:with-param name="string" select="substring-after($string, $delimiter)"/>
<xsl:with-param name="delimiter" select="$delimiter"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="word" select="normalize-space($string)"/>
<xsl:if test="string-length($word) > 0">
<WORD>
<xsl:value-of select="$word"/>
</WORD>
</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
用于“标记”一串文本的XSLT模板,我在这里提出了这个问题:
tokenizing-and-sorting-with-xslt-1-0
(注意,在XSLT2.0中,我相信有一个tokenize函数,可以简化上述内容)
这会给你这样的XML ......
<PUBLDES>
<WORD>The</WORD>
<IT>
<WORD>European</WORD>
<WORD>Journal</WORD>
<WORD>of</WORD>
....
等等......
接下来,将使用另一个XSLT文档遍历此XML文档,仅输出前45个单词元素。为此,我重复应用一个模板,保持当前找到的WORDS数量的总计。匹配节点时,有三种可能性
这是样式表,其所有的可怕性
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:variable name="WORDCOUNT">6</xsl:variable>
<!-- Match root element -->
<xsl:template match="/">
<xsl:apply-templates select="descendant::*[1]" mode="word">
<xsl:with-param name="previousWords">0</xsl:with-param>
</xsl:apply-templates>
</xsl:template>
<!-- Match any node -->
<xsl:template match="node()" mode="word">
<xsl:param name="previousWords"/>
<!-- Number of words below the element (at any depth) -->
<xsl:variable name="childWords" select="count(descendant::WORD)"/>
<xsl:choose>
<!-- Matching a WORD element -->
<xsl:when test="local-name(.) = 'WORD'">
<!-- Copy the word -->
<WORD>
<xsl:value-of select="."/>
</WORD>
<!-- If there are still words to output, continue processing at next sibling -->
<xsl:if test="$previousWords + 1 < $WORDCOUNT">
<xsl:apply-templates select="following-sibling::*[1]" mode="word">
<xsl:with-param name="previousWords">
<xsl:value-of select="$previousWords + 1"/>
</xsl:with-param>
</xsl:apply-templates>
</xsl:if>
</xsl:when>
<!-- Match a node where the number of words below it is within allowed limit -->
<xsl:when test="$childWords <= $WORDCOUNT - $previousWords">
<!-- Copy the element -->
<xsl:copy>
<!-- Copy all its desecendants -->
<xsl:copy-of select="*|@*"/>
</xsl:copy>
<!-- If there are still words to output, continue processing at next sibling -->
<xsl:if test="$previousWords + $childWords < $WORDCOUNT">
<xsl:apply-templates select="following-sibling::*[1]" mode="word">
<xsl:with-param name="previousWords">
<xsl:value-of select="$previousWords + $childWords"/>
</xsl:with-param>
</xsl:apply-templates>
</xsl:if>
</xsl:when>
<!-- Match nodes where the number of words below it would exceed current limit -->
<xsl:otherwise>
<!-- Copy the node -->
<xsl:copy>
<!-- Continue processing at very first child node -->
<xsl:apply-templates select="descendant::*[1]" mode="word">
<xsl:with-param name="previousWords">
<xsl:value-of select="$previousWords"/>
</xsl:with-param>
</xsl:apply-templates>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
如果你只输出前4个单词,比如说,这会给你以下输出
<PUBLDES>
<WORD>The</WORD>
<IT>
<WORD>European</WORD>
<WORD>Journal</WORD>
<WORD>of</WORD>
</IT>
</PUBLDES>
当然,您需要另一个转换来删除WORD元素,然后保留文本。这应该是相当直接的......
这一切都非常讨厌,但这是我现在能想到的最好的事情!