RegEx不接受%

时间:2012-02-14 03:06:09

标签: regex unicode character-properties

这套RegEx /^[\p{L}\p{N}]+/u出了什么问题。当我的大四学生进入 %openminded 时,正则表达式返回false。我需要它接受这种格式

  

%openminded
  100%openminded
  openminded 100%

我需要在表达式中添加什么?即使用户首先输入%或任何特殊字符,它也会接受输入。

1 个答案:

答案 0 :(得分:5)

百分号不是\pS符号。这是一个\pP标点符号,explained by uniprops

$ uniprops %
U+0025 ‹%› \N{PERCENT SIGN}
    \pP \p{Po}
    All Any ASCII Assigned Basic_Latin Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn POSIX_Graph POSIX_Print POSIX_Punct Print Punctuation X_POSIX_Graph X_POSIX_Print X_POSIX_Punct

您应该熟悉您最喜欢的角色所属的常规类别(也许还有脚本)。以下是running unichars的一些示例输出:

$ unichars -gs '[\pP\pS]' '\p{Block=Basic_Latin}'
U+0021 ‭ !  GC=Po SC=Common       EXCLAMATION MARK
U+0022 ‭ "  GC=Po SC=Common       QUOTATION MARK
U+0023 ‭ #  GC=Po SC=Common       NUMBER SIGN
U+0024 ‭ $  GC=Sc SC=Common       DOLLAR SIGN
U+0025 ‭ %  GC=Po SC=Common       PERCENT SIGN
U+0026 ‭ &  GC=Po SC=Common       AMPERSAND
U+0027 ‭ '  GC=Po SC=Common       APOSTROPHE
U+0028 ‭ (  GC=Ps SC=Common       LEFT PARENTHESIS
U+0029 ‭ )  GC=Pe SC=Common       RIGHT PARENTHESIS
U+002A ‭ *  GC=Po SC=Common       ASTERISK
U+002B ‭ +  GC=Sm SC=Common       PLUS SIGN
U+002C ‭ ,  GC=Po SC=Common       COMMA
U+002D ‭ -  GC=Pd SC=Common       HYPHEN-MINUS
U+002E ‭ .  GC=Po SC=Common       FULL STOP
U+002F ‭ /  GC=Po SC=Common       SOLIDUS
U+003A ‭ :  GC=Po SC=Common       COLON
U+003B ‭ ;  GC=Po SC=Common       SEMICOLON
U+003C ‭ <  GC=Sm SC=Common       LESS-THAN SIGN
U+003D ‭ =  GC=Sm SC=Common       EQUALS SIGN
U+003E ‭ >  GC=Sm SC=Common       GREATER-THAN SIGN
U+003F ‭ ?  GC=Po SC=Common       QUESTION MARK
U+0040 ‭ @  GC=Po SC=Common       COMMERCIAL AT
U+005B ‭ [  GC=Ps SC=Common       LEFT SQUARE BRACKET
U+005C ‭ \  GC=Po SC=Common       REVERSE SOLIDUS
U+005D ‭ ]  GC=Pe SC=Common       RIGHT SQUARE BRACKET
U+005E ‭ ^  GC=Sk SC=Common       CIRCUMFLEX ACCENT
U+005F ‭ _  GC=Pc SC=Common       LOW LINE
U+0060 ‭ `  GC=Sk SC=Common       GRAVE ACCENT
U+007B ‭ {  GC=Ps SC=Common       LEFT CURLY BRACKET
U+007C ‭ |  GC=Sm SC=Common       VERTICAL LINE
U+007D ‭ }  GC=Pe SC=Common       RIGHT CURLY BRACKET
U+007E ‭ ~  GC=Sm SC=Common       TILDE

所以要么为你的班级添加正确的一般类别,比如

 [\pL\pN\p{Po}]

或只是添加您需要的特定字符。顺便说一下,任何想要\pL的人几乎总是也想要\pM