XSLT中的字符串操作,多个输入和递归

时间:2014-08-23 20:00:13

标签: xml xslt-2.0

与此问题相关Creating New Functions

我仍在使用一个优雅的函数来搜索特定的代码段,如果找到某些触发器,则返回substring-after,直到end触发器。示例:

    <Data>Moby Dick [videorecording] / United Artists ; A Moulin Picture ; screenplay by Ray Bradbury and John Huston ; directed by John Huston.</Data>

    <Data>Oliver Twist [videorecording] / Independent Producers ; screen play by David Lean and Stanley Haynes ; produced by Ronald Neame ; directed by David Lean.</Data>

    <Data>Romeo + Juliet [videorecording] / Twentieth Century Fox presents a Bazmark production ; producers, Gabriella Martinelli, Baz Luhrmann ; screenplay, Craig Pearce, Baz Luhrmann.</Data>

期望的结果:

...
<writer>Ray Bradbury</writer>
<writer>John Huston</writer>
...

...
<writer>David Lean</writer>
<writer>Stanley Haynes</writer>
...

...
<writer>Craig Pearce</writer>
<writer>Baz Luhrmann</writer>
...

我的尝试:

 <xsl:function name="foo:personSep">
        <xsl:param name="string"/>
        <xsl:param name="delim"/>
        <xsl:choose>
            <xsl:when test="not(contains($string,$delim))">
                <writer>
                    <xsl:value-of select="$string"/>
                </writer>
            </xsl:when>
            <xsl:when test="contains($string,$delim)">
                <writer>
                    <xsl:value-of select="substring-before($string, $delim)"/>
                </writer>
                <xsl:sequence select="functx:personSep(substring-after($string, $delim), $delim)"/>
            </xsl:when>
            <xsl:otherwise>
                <writer> 
                </writer>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:function>

    <xsl:template match="ss:Cell[3]/ss:Data" mode="writer">
        <xsl:variable name="cell3Data" select="normalize-space(.)"/>

        <xsl:variable name="writerFind" as="xs:string*"
            select="('screenplay by ','screen play by ','screenplay, ')"/>

        <xsl:for-each select="1 to count($writerFind)">
            <xsl:variable name="x" select="."/>
            <xsl:variable name="writer" select="substring-after($cell3Data, $writerFind[$x])"/>
            <xsl:if test="$writer != ''">
                <xsl:if test="contains($writer, ' and ')">
                    <xsl:sequence
                        select="foo:personSep(functx:right-trim(replace($writer, '[;\.].*$', '')),' and ')"
                    />
                </xsl:if>
                <xsl:if test="contains($writer, ', ')">
                    <xsl:sequence
                        select="foo:personSep(functx:right-trim(replace($writer, '[;\.].*$', '')),', ')"
                    />
                </xsl:if>
            </xsl:if>
        </xsl:for-each>
    </xsl:template>

我的boot-strappy kludgeriffic版本大部分都可以使用,但我确信有一个更精简的清洁解决方案...它也不会捕获任何包含逗号AND和类似的版本

“约翰史密斯,艾德琼斯和罗伯特丹弗斯的电影剧本”

1 个答案:

答案 0 :(得分:1)

以下是与Data匹配的模板,并提取writer s:

<xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"    
    version="2.0">

<xsl:output method="xml" indent="yes"/>

<xsl:template match="Data">
  <xsl:analyze-string select="." regex="(screenplay by |screen play by |screenplay, )([^.;]+)(;|\.|$)">
    <xsl:matching-substring>
      <xsl:analyze-string select="regex-group(2)" regex="(\w+(\s*\w*))(\s*(,|and|$))">
        <xsl:matching-substring>
          <writer><xsl:value-of select="normalize-space(regex-group(1))"/></writer>
        </xsl:matching-substring>
      </xsl:analyze-string>
    </xsl:matching-substring>
  </xsl:analyze-string>
</xsl:template>

</xsl:stylesheet>

当我在输入

上应用Saxon 9.5时
<Root>
   <Data>Moby Dick [videorecording] / United Artists ; A Moulin Picture ; screenplay by Ray Bradbury and John Huston ; directed by John Huston.</Data>

    <Data>Oliver Twist [videorecording] / Independent Producers ; screen play by David Lean and Stanley Haynes ; produced by Ronald Neame ; directed by David Lean.</Data>

    <Data>Romeo + Juliet [videorecording] / Twentieth Century Fox presents a Bazmark production ; producers, Gabriella Martinelli, Baz Luhrmann ; screenplay, Craig Pearce, Baz Luhrmann.</Data>
</Root>

我得到了结果

   <writer>Ray Bradbury</writer>
<writer>John Huston</writer>

    <writer>David Lean</writer>
<writer>Stanley Haynes</writer>

    <writer>Craig Pearce</writer>
<writer>Baz Luhrmann</writer>

如果你想编写一个函数,那就行了。

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:mf="http://example.com/mf"
    exclude-result-prefixes="xs mf"    
    version="2.0">

<xsl:output method="xml" indent="yes"/>

<xsl:function name="mf:extract" as="element()*">
  <xsl:param name="input" as="xs:string"/>
  <xsl:param name="markers" as="xs:string*"/>
  <xsl:param name="element-name" as="xs:string"/>
  <xsl:analyze-string select="$input" regex="({string-join($markers, '|')})([^.;]+)(;|\.|$)">
    <xsl:matching-substring>
      <xsl:analyze-string select="regex-group(2)" regex="(\w+(\s*\w*))(\s*(,|and|$))">
        <xsl:matching-substring>
          <xsl:element name="{$element-name}"><xsl:value-of select="normalize-space(regex-group(1))"/></xsl:element>
        </xsl:matching-substring>
      </xsl:analyze-string>
    </xsl:matching-substring>
  </xsl:analyze-string>
</xsl:function>

<xsl:template match="Data">
  <xsl:sequence select="mf:extract(., ('screenplay by ', 'screen play by ', 'screenplay, '), 'writer')"/>
</xsl:template>

</xsl:stylesheet>