使用xsl:analyze-string将嵌套的正则表达式组输出为嵌套XML

时间:2012-11-01 23:56:37

标签: xml regex xslt xslt-2.0

我有一个捕获嵌套组的正则表达式,我想输出与这些组相关的嵌套XML,就像fn:analyze-string一样。这是一个简单的例子:

正则表达式

((Luckenbach|Houston|Little Rock),\s(TX|AK))

输入

Let's go to Luckenbach, TX with Waylon and Willie and the boys.

期望的输出

<s:analyze-string-result xmlns:s="http://www.w3.org/2009/xpath-functions/analyze-string">
    <s:non-match>Let's go to </s:non-match>
    <s:match>
        <s:group nr="1">
            <s:group nr="2">Luckenbach</s:group>, <s:group nr="3">TX</s:group
        </s:group>
    </s:match>
    <s:non-match> with Waylon and Willie and the boys.</s:non-match>
</s:analyze-string-result>

问题在于,似乎没有办法递归处理regex-group()xsl:analyze-string中的xsl:matching-substring值(或者像xQuery fn:analyze-string一样访问它们的XML) ))。

解决方案需要足够通用才能使用不同的正则表达式,其中许多正则表达式具有多级嵌套捕获组。

1 个答案:

答案 0 :(得分:2)

当上下文节点包含示例文本时,以下内容将生成所需的输出:

    <snip>
        <xsl:analyze-string 
                select="." 
                regex="((Luckenbach|Houston|Little Rock),\s(TX|AK))">
            <xsl:matching-substring>
                <location>
                    <city><xsl:value-of select="regex-group(2)"/></city>
                    <xsl:text>, </xsl:text>
                    <state><xsl:value-of select="regex-group(3)"/></state>
                </location>                       
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <xsl:value-of select="."/>
            </xsl:non-matching-substring>
        </xsl:analyze-string>    
    </snip>

如果您希望仅在REGEX匹配时生成<snip>,则可以稍微调整REGEX和处理组:

        <xsl:analyze-string 
                select="." 
                regex="((.*)((Luckenbach|Houston|Little Rock),\s(TX|AK))(.*))">
            <xsl:matching-substring>
                <snip>
                    <xsl:value-of select="regex-group(2)"/>
                    <location>
                        <city><xsl:value-of select="regex-group(4)"/></city>
                        <xsl:text>, </xsl:text>
                        <state><xsl:value-of select="regex-group(5)"/></state>
                    </location>
                    <xsl:value-of select="regex-group(6)"/>
                </snip>   
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <xsl:value-of select="."/>
            </xsl:non-matching-substring>
        </xsl:analyze-string> 

如果要重现XQuery函数analyze-string()的行为,可以定义自己的自定义函数:

<xsl:function name="my:analyze-string" as="item()*" xmlns:my="http://stackoverflow.com/questions/13187307/output-nested-regex-groups-as-nested-xml-using-xslanalyze-string">
    <xsl:param name="val" />

    <analyze-string-result xmlns="http://www.w3.org/2005/xpath-functions">   
        <xsl:analyze-string select="$val" regex="((.*)((Luckenbach|Houston|Little Rock),\s(TX|AK))(.*))">
            <xsl:matching-substring>
                <xsl:for-each select="1 to 6">
                    <xsl:if test="regex-group(.)">
                        <match>
                            <group  nr="{.}">
                                <xsl:value-of select="regex-group(.)"/>
                            </group>
                        </match>
                    </xsl:if>
                </xsl:for-each>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <non-match>
                    <xsl:value-of select="."/>
                </non-match> 
            </xsl:non-matching-substring>
        </xsl:analyze-string>    
    </analyze-string-result>   
</xsl:function>

当这样调用时:

 <xsl:variable name="value" 
      select='"Let&apos;s go to Luckenbach, TX with Waylon and Willie and the boys."'/>
 <xsl:copy-of select="my:analyze-string($value)"
    xmlns:my="http://stackoverflow.com/questions/13187307/output-nested-regex-groups-as-nested-xml-using-xslanalyze-string"/>  

它产生以下输出:

<analyze-string-result xmlns="http://www.w3.org/2005/xpath-functions"
                       xmlns:my="http://stackoverflow.com/questions/13187307/output-nested-regex-groups-as-nested-xml-using-xslanalyze-string">
   <match>
      <group nr="1">Let's go to Luckenbach, TX with Waylon and Willie and the boys.</group>
   </match>
   <match>
      <group nr="2">Let's go to </group>
   </match>
   <match>
      <group nr="3">Luckenbach, TX</group>
   </match>
   <match>
      <group nr="4">Luckenbach</group>
   </match>
   <match>
      <group nr="5">TX</group>
   </match>
   <match>
      <group nr="6"> with Waylon and Willie and the boys.</group>
   </match>
</analyze-string-result>