Matching non-alphanumeric characters excluding diacritics in EmEditor

时间:2016-08-31 17:09:46

标签: regex match combinations emeditor

I am trying to match any non-alphanumeric character with a Unicode-aware regex pattern and was trying to combine [\u00D8-\u00F6] and [^\w'’-] together. To no avail.

I have this: right ស្ដាំ sdam. And when I write [^\w'’-] in Find and replace dialog, it matches non-alphanumeric and part of the non-English character (ាំ and ). I don't want to get those diacritics.

When I write [\u00D8-\u00F6], it will not match English characters, but it will match match some non-English characters and those decorated words like ាំ and .

1 个答案:

答案 0 :(得分:0)

你不能依赖默认的 Boost.Regex 引擎,它似乎在EmEditor中实现得很差。

转到高级并将正则表达式引擎更改为 Onigmo

enter image description here

然后使用[^\p{L}\p{M}\p{N}](或[^\p{L}\p{M}\p{N}'’-]+一次匹配它们,并排除可能是单词部分的匹配'- )或您使用的任何其他正则表达式 - Unicode类别类将开始工作。

enter image description here

请注意,\w不支持Unicode,因此您需要使用\p{L}\p{M}\p{N}

  • \p{L} - 来自BMP平面的任何Unicode字母
  • \p{M} - 任何变音符号
  • \p{N} - 任何Unicode数字

以及更多内容可以在UnicodeProps.txt文件中找到。