我正在进行xslt转换,我遇到了问题。我必须转换的文本有一些html标签,但不是&lt;和&gt;我有<
和>
。我想更改
ICI-进口/问题/条/ languageVersion /抽象
并将其转换为新元素:abstract,删除html标签
这是我试图实现的代码:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="ici-import/issue/article/languageVersion/">
<xsl:variable name="StripHTML">
<![CDATA[<\s*\w.*?>|<\s*/\s*\w\s*.*?>]]>
</xsl:variable>
<xsl:analyze-string select="abstract" regex="{$StripHTML}">
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
我当前的xslt:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<ici-import>
<journal issn="2299-3711" />
<issue>
<xsl:for-each select="issue/section">
<xsl:for-each select="article">
<article>
<languageVersion>
<xsl:if test="abstract != ''">
<abstract>
<xsl:value-of select="abstract"/>
</abstract>
</xsl:if>
</languageVersion>
</article>
</xsl:for-each>
</xsl:for-each>
</issue>
</ici-import>
</xsl:template>
</xsl:stylesheet>
答案 0 :(得分:0)
如果您已转义的内容是XML片段,那么在XSLT 3中(由Saxon 9.8 HE或Altova 2017和2018支持)您可以简单地解析内容并将其推送到仅复制文本的模式。
以下是一个例子:
<?xml version="1.0" encoding="utf-8" ?>
<root>
<data><![CDATA[<section id="s1">
<h2>Test</h2>
<p style="font-size: 120%">This is some text.</p>
</section>
<section>
<h2>Another test</h2>
<p>This is some text.</p>
</section>]]></data>
</root>
由Saxon 9.8 HE和XSLT
转换<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
xmlns:map="http://www.w3.org/2005/xpath-functions/map"
xmlns:array="http://www.w3.org/2005/xpath-functions/array"
exclude-result-prefixes="xs math map array"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:mode name="strip" on-no-match="text-only-copy"/>
<xsl:template match="data">
<xsl:copy>
<xsl:apply-templates select="parse-xml-fragment(.)" mode="strip"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
到
<?xml version="1.0" encoding="UTF-8"?><root>
<data>
Test
This is some text.
Another test
This is some text.
</data>
</root>
如果内容不是XML而是HTML,那么您可以导入用XSLT 2编写的David Carlisle的HTML解析器的本地副本,并使用它来解析内容并将其推送到仅复制文本的模式:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
xmlns:map="http://www.w3.org/2005/xpath-functions/map"
xmlns:array="http://www.w3.org/2005/xpath-functions/array"
xmlns:htmlparser="data:,dpc"
exclude-result-prefixes="xs math map array htmlparser"
version="3.0">
<!-- adjust to local copy of file -->
<xsl:import href="https://github.com/davidcarlisle/web-xslt/raw/master/htmlparse/htmlparse.xsl"/>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:mode name="strip" on-no-match="text-only-copy"/>
<xsl:template match="data">
<xsl:copy>
<xsl:apply-templates select="htmlparser:htmlparse(.)/node()" mode="strip"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
使用Saxon 9.8 HE转换
<?xml version="1.0" encoding="utf-8" ?>
<root>
<data><![CDATA[<section id=s1>
<h2>Test</h2>
<p style="font-size: 120%">This is some text.</p>
</section>
<section>
<h2>Another test</h2>
<p>This is some text.
<p>This is some other text.
</section>]]></data>
</root>
进入
<?xml version="1.0" encoding="UTF-8"?><root>
<data>
Test
This is some text.
Another test
This is some text.
This is some other text.
</data>
</root>
当然,由于HTML解析器是用XSLT 2编写的,你也可以在任何XSLT 2处理器上使用它,你只需要用{/ 1>替换<xsl:mode name="strip" on-no-match="text-only-copy"/>
<xsl:template match="*" mode="strip">
<xsl:apply-templates/>
</xsl:template>