Question

我看过here，根据我的理解，下面的正则表达式只是意味着“任何unicode字符序列”。有人可以确认一下吗？

当前正则表达式：/ ^（？＆gt; \ P {M} \ p {M} *）+ $ / u

如果我阅读手册，也说

a）\ P {M} = \ PM

b）（？> \ PM \ pM *）= \ X

所以有了这两件事，我能不能将正则表达式简化为？：

建议的正则表达式：/ ^ \ X + $ / u

我仍然不理解......

Answer 1

^            # start of string followed by 
(?>          # an independent (non-backtracking) capturing group containing 
    \P{M}    # a single unicode character which is not in the `Mark` category
    \p{M}*   # 0 or more characters in the `Mark` category
)+           # with this capturing group repeated 1 or more times
$            # the end-of-line

^\X+$不包含捕获组; \P{M}\p{M}*在其他方面是等效的。

Answer 2

是的，\P{M}\p{M}*可以简化为\X，但并非所有语言都支持\X而（根据我的经验）\P{M}和\p{M}支持更多经常。

例如，Java和.NET的正则表达式引擎不支持\X（Perl当然......）。

更多信息，请参阅：http://www.regular-expressions.info/unicode.html

正则表达式用英语解释

2 个答案: