Question

在这个网站http://gskinner.com/RegExr/（这是一个RegEx测试网站），这个正则表达式匹配工作比赛： [^\x00-\xff]
示例文字：test123 或元件数据不可用

但如果我有这个输入XML：

<?xml version="1.0" encoding="UTF-8" ?>
<root>
  <node>test123 或元件数据不可用</node>
</root>

我尝试使用Saxon 9的XSLT 2.0样式表：

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/root/node">
    <xsl:if test="matches(., '[^\x00-\xff]')">
      <xsl:text>Text has chinese characters!</xsl:text>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

Saxon 9给出了以下错误输出：

    FORX0002: Error at character 3 in regular expression "[^\x00-\xff]": invalid escape sequence
  Failed to compile stylesheet. 1 error detected.

如何检查XSLT 2.0中的中文字符？

Answer 1

XPath支持的正则表达式方言基于XSD中定义的方法：您可以在我的XSLT 2.0程序员参考中找到W3C文档中的完整规范，或者如果您更喜欢更易读的内容。不要以为所有的正则表达方言都是一样的。 XPath regexen中没有\x转义，因为它设计用于嵌入已经提供&#xHHHH;的XML。

您可能会发现使用命名的Unicode块更方便，例如\p{IsCJKUnifiedIdeographs}，而不是使用十六进制范围。

另见What's the complete range for Chinese characters in Unicode?

Answer 2

在Michael Kay的帮助下，我可以自己回答我的问题。谢谢迈克尔！该解决方案有效，但在我看来，这个长Unicode范围看起来不太漂亮。

如果在给定的XML中找到包含正则表达式的任何中文字符，则此XSLT将打印文本消息：

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/root/node">
    <xsl:if test="matches(.,'[&#x4E00;-&#x9FFF;&#x3400;-&#x4DFF;&#x20000;-&#x2A6DF;&#xF900;-&#xFAFF;&#x2F800;-&#x2FA1F;]')">
      <xsl:text>Text has chinese characters!</xsl:text>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

具有命名Unicode块的解决方案：

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/root/node">
    <xsl:if test="matches(., '[\p{IsCJKUnifiedIdeographs}\p{IsCJKUnifiedIdeographsExtensionA}\p{IsCJKUnifiedIdeographsExtensionB}\p{IsCJKCompatibilityIdeographs}\p{IsCJKCompatibilityIdeographsSupplement}]')">
      <xsl:text>Text has chinese characters!</xsl:text>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

如何在XSLT中检查xml textnode是否具有带RegEx的中文字符

2 个答案: