在xslt转换期间从文本中删除html标记

时间:2017-12-08 10:11:34

标签: xml xslt xslt-2.0

我正在进行xslt转换,我遇到了问题。我必须转换的文本有一些html标签,但不是<和>我有<>。我想更改

中所有标签的内容
  

ICI-进口/问题/条/ languageVersion /抽象

并将其转换为新元素:abstract,删除html标签

这是我试图实现的代码:

<xsl:stylesheet version="2.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="ici-import/issue/article/languageVersion/">
        <xsl:variable name="StripHTML">
            <![CDATA[&lt;\s*\w.*?&gt;|&lt;\s*/\s*\w\s*.*?&gt;]]>
        </xsl:variable>
        <xsl:analyze-string select="abstract" regex="{$StripHTML}">
            <xsl:non-matching-substring>
                <xsl:value-of select="."/>
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>
</xsl:stylesheet>

我当前的xslt:

<xsl:stylesheet version="2.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <ici-import>
            <journal issn="2299-3711" />
            <issue>
                <xsl:for-each select="issue/section">
                    <xsl:for-each select="article">
                        <article>
                            <languageVersion>
                                <xsl:if test="abstract != ''">
                                    <abstract>
                                        <xsl:value-of select="abstract"/>
                                    </abstract>
                                </xsl:if>
                            </languageVersion>
                        </article>
                    </xsl:for-each>
                </xsl:for-each>
            </issue>
        </ici-import>
    </xsl:template>
</xsl:stylesheet>

1 个答案:

答案 0 :(得分:0)

如果您已转义的内容是XML片段,那么在XSLT 3中(由Saxon 9.8 HE或Altova 2017和2018支持)您可以简单地解析内容并将其推送到仅复制文本的模式。

以下是一个例子:

<?xml version="1.0" encoding="utf-8" ?>
<root>
    <data><![CDATA[<section id="s1">
    <h2>Test</h2>
    <p style="font-size: 120%">This is some text.</p>
</section>
<section>
  <h2>Another test</h2>
  <p>This is some text.</p>
  </section>]]></data>
</root>

由Saxon 9.8 HE和XSLT

转换
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    xmlns:map="http://www.w3.org/2005/xpath-functions/map"
    xmlns:array="http://www.w3.org/2005/xpath-functions/array"
    exclude-result-prefixes="xs math map array"
    version="3.0">

  <xsl:mode on-no-match="shallow-copy"/>
  <xsl:mode name="strip" on-no-match="text-only-copy"/>

  <xsl:template match="data">
      <xsl:copy>
          <xsl:apply-templates select="parse-xml-fragment(.)" mode="strip"/>
      </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

<?xml version="1.0" encoding="UTF-8"?><root>
    <data>
    Test
    This is some text.


  Another test
  This is some text.
  </data>
</root>

如果内容不是XML而是HTML,那么您可以导入用XSLT 2编写的David Carlisle的HTML解析器的本地副本,并使用它来解析内容并将其推送到仅复制文本的模式:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    xmlns:map="http://www.w3.org/2005/xpath-functions/map"
    xmlns:array="http://www.w3.org/2005/xpath-functions/array"
    xmlns:htmlparser="data:,dpc"
    exclude-result-prefixes="xs math map array htmlparser"
    version="3.0">

  <!-- adjust to local copy of file -->
  <xsl:import href="https://github.com/davidcarlisle/web-xslt/raw/master/htmlparse/htmlparse.xsl"/>

  <xsl:mode on-no-match="shallow-copy"/>
  <xsl:mode name="strip" on-no-match="text-only-copy"/>

  <xsl:template match="data">
      <xsl:copy>
          <xsl:apply-templates select="htmlparser:htmlparse(.)/node()" mode="strip"/>
      </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

使用Saxon 9.8 HE转换

<?xml version="1.0" encoding="utf-8" ?>
<root>
    <data><![CDATA[<section id=s1>
    <h2>Test</h2>
    <p style="font-size: 120%">This is some text.</p>
</section>
<section>
  <h2>Another test</h2>
  <p>This is some text.
  <p>This is some other text.
  </section>]]></data>
</root>

进入

<?xml version="1.0" encoding="UTF-8"?><root>
    <data>
    Test
    This is some text.


  Another test
  This is some text.
  This is some other text.
  </data>
</root>

当然,由于HTML解析器是用XSLT 2编写的,你也可以在任何XSLT 2处理器上使用它,你只需要用{/ 1>替换<xsl:mode name="strip" on-no-match="text-only-copy"/>

<xsl:template match="*" mode="strip">
  <xsl:apply-templates/>
</xsl:template>