使用xsl:analyze-string更改可能重复的多个单词

时间:2013-04-15 15:11:57

标签: xml xslt xslt-2.0

我希望将First Letter Caps中的标题更改为正确的标题案例,即文章,连词和选择介词都是小写的。最初我希望使用“停用词”列表的xml文档来实现这一目标,但我最接近成功的是在analyze-string中使用正则表达式。问题是,作为xslt的新手,我不知道在没有无限循环的情况下使其递归。此外,理想情况下,这将是一个功能而不是模板。我感谢那里的专家提供的任何帮助。

输入:

<element>
    <title>The String Is In First Letter Caps And May Have A Word Or Words Such As A, An, Or The And And, But, For, As, At, In, Or When.</title>
</element> 

XSLT:

<xsl:template name="proper-case" match="/element/title">
<xsl:param name="title" select="."/> 
    <xsl:analyze-string select="$title" regex="\WA\W|\WAn\W|\WThe\W|\WAnd\W|\WBut\W|\WFor\W|\WNor\W|\WOr\W|\WFSo\W|\WYet\W|\WAs\W|\WAt\W|\WBy\W|\WIf\W|\WIn\W|\WOf\W|\WOn\W|\WTo\W|\WWith\W" flags="i">
    <xsl:matching-substring>
        <xsl:value-of select="lower-case(.)"/>                               
    </xsl:matching-substring>
    <xsl:non-matching-substring>
        <xsl:value-of select="."/>               
    </xsl:non-matching-substring>
    </xsl:analyze-string>
</xsl:template>

当前输出:

<element>
    <title>The String Is in First Letter Caps and May Have a Word or Words Such as A, an, or The and And, but, for, as, at, in, or When.</title>
</element>

期望的输出:

<element>
    <title>The String Is in First Letter Caps and May Have a Word or Words Such as a, an, or the and and, but, for, as, at, in, or when.</title>
</element>

2 个答案:

答案 0 :(得分:1)

我认为更好的选择是使用序列作为“停用词”列表。

实施例..

XML输入

<element>
    <title>The String Is In First Letter Caps And May Have A Word Or Words Such As A, An, Or The And And, But, For, As, At, In, Or When.</title>
</element>

XSLT 2.0

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>

    <!--Sequence of lower case words.-->
    <xsl:param name="lcw" select="('A','An','The','And','But','For','Nor','Or',
        'So','Yet','As','At','By','If','In','Of','On','To','With','When')"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="title">
        <xsl:copy>
            <xsl:analyze-string select="." regex="\w+">
                <xsl:matching-substring>
                    <xsl:choose>
                        <xsl:when test=".=$lcw and not(position()=1)">
                            <xsl:value-of select="lower-case(.)"/>
                        </xsl:when>
                        <xsl:otherwise>
                            <xsl:value-of select="."/>
                        </xsl:otherwise>
                    </xsl:choose>
                </xsl:matching-substring>
                <xsl:non-matching-substring>
                    <xsl:value-of select="."/>
                </xsl:non-matching-substring>
            </xsl:analyze-string>           
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

XML输出

<element>
   <title>The String Is in First Letter Caps and May Have a Word or Words Such as a, an, or the and and, but, for, as, at, in, or when.</title>
</element>

答案 1 :(得分:0)

在询问任何问题之前,您应该检查所有类似的帖子。看看xslt captalize each word ignore conjunctions