正则表达式,用于匹配空格或标点和非字母数字

时间:2019-05-09 12:53:35

标签: c# regex split

我有这个正则表达式:

(\s+)|([.,!?:;'\"\'-])

这与数组中的任何空格或标点匹配,但并非与所有标点匹配,因此我正在努力寻找与任何标点匹配的示例。

我考虑过匹配字符不是字母数字的地方,但这会导致带有重音字母和不同词典的问题。

标点符号是否包含所有内容,例如(当我说我希望包含符号的标点符号时):

  

`,。#@

2 个答案:

答案 0 :(得分:2)

您可以使用[\p{P}\p{S}]

(\s+)|([\p{P}\p{S}])

[\p{P}\p{S}]将匹配标点符号或字符char的任何字符。

请参阅list of subproperties这些类指的是:

Punctuation
Pc  Punctuation, connector      Includes "_" underscore
Pd  Punctuation, dash           Includes several hyphen characters
Ps  Punctuation, open           Opening bracket characters
Pe  Punctuation, close          Closing bracket characters
Pi  Punctuation, initial quote  Opening quotation mark. Does not include the ASCII "neutral" quotation mark. May behave like Ps or Pe depending on usage
Pf  Punctuation, final quote    Closing quotation mark. May behave like Ps or Pe depending on usage
Po  Punctuation, other

Symbol
Sm  Symbol, math                Mathematical symbols (e.g., +, −, =, ×, ÷, √, ∊). Does not include parentheses and brackets, which are in categories Ps and Pe. Also does not include !, *, -, or /, which despite frequent use as mathematical operators, are primarily considered to be "punctuation".
Sc  Symbol, currency            Currency symbols
Sk  Symbol, modifier    
So  Symbol, other

答案 1 :(得分:0)

如果不需要匹配下划线,则可以使用

  

\ W

与其匹配的不是数字,字母或下划线。