I am trying to match any non-alphanumeric character with a Unicode-aware regex pattern and was trying to combine [\u00D8-\u00F6]
and [^\w'’-]
together. To no avail.
I have this: right ស្ដាំ sdam
. And when I write [^\w'’-]
in Find and replace dialog, it matches non-alphanumeric and part of the non-English character (ាំ
and ្
). I don't want to get those diacritics.
When I write [\u00D8-\u00F6]
, it will not match English characters, but it will match match some non-English characters and those decorated words like ាំ
and ្
.
答案 0 :(得分:0)
你不能依赖默认的 Boost.Regex 引擎,它似乎在EmEditor中实现得很差。
转到高级并将正则表达式引擎更改为 Onigmo 。
然后使用[^\p{L}\p{M}\p{N}]
(或[^\p{L}\p{M}\p{N}'’-]+
一次匹配它们,并排除可能是单词部分的匹配'
,’
和-
)或您使用的任何其他正则表达式 - Unicode类别类将开始工作。
请注意,\w
不支持Unicode,因此您需要使用\p{L}\p{M}\p{N}
:
\p{L}
- 来自BMP平面的任何Unicode字母\p{M}
- 任何变音符号\p{N}
- 任何Unicode数字以及更多内容可以在UnicodeProps.txt文件中找到。