Question

以下Groovy命令说明了我的问题。

首先，这符合预期（as seen on lotrepls.appspot.com）（注意\u0061是'a'）。

>>> print "a".matches(/\u0061/)

true

现在假设我们要使用Unicode转义\n来匹配\u000A。使用"pattern"作为字符串的行为符合预期：

>>> print "\n".matches("\u000A");

Interpreter exception: com.google.lotrepls.shared.InterpreterException:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed,
Script1.groovy: 1: expecting anything but ''\n''; got it anyway
@ line 1, column 21. 1 error

这是预期的，因为至少在Java中，早期处理Unicode转义（JLS 3.3），所以：

print "\n".matches("\u000A")

真的和：

相同

print "\n".matches("
")

修复是为了逃避Unicode转义，让正则表达式引擎处理它，如下所示：

>>> print "\n".matches("\\u000A")

true

现在这里是问题部分：我们如何才能使用Groovy /pattern/语法而不是使用字符串文字呢？

以下是一些失败的尝试：

>>> print "\n".matches(/\u000A/)

Interpreter exception: com.google.lotrepls.shared.InterpreterException:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed,
Script1.groovy: 1: expecting EOF, found '(' @ line 1, column 19.
1 error

>>> print "\n".matches(/\\u000A/)

false

>>> print "\\u000A".matches(/\\u000A/);

true

Answer 1

〜 “[\ u0000- \ u0008 \ u000B \ u000C \ u000E- \ u001F \ u007F- \ u009F]”

似乎应该正常工作。根据我见过的文档，不应该用一个斜线字符串来要求双反斜杠，所以我不知道为什么编译器对它们不满意。

Answer 2

首先，在这方面，似乎Groovy发生了变化，至少在https://groovyconsole.appspot.com/和本地Groovy shell上，"\n".matches(/\u000A/)运行得很好，评估为true。

如果再次遇到类似的情况，只需使用"\n".matches(/\u005Cu000A/)中的unicode转义对反斜杠进行编码，然后unicode转义到字符的转换将再次使其变为反斜杠，然后保留正则表达式解析器的序列

另一个选择是例如使用u或"\n".matches(/${'\\'}u000A/)

将反斜杠与"\n".matches('\\' + /u000A/)分开

如何在Groovy的/ pattern /语法中转义Unicode转义

2 个答案: