c#正则表达式匹配一组不重复的字符

时间:2017-04-20 10:38:52

标签: c# regex

受到其他问题的启发(我已经接受了非正则表达式解决方案) c# regex match set of characters in any order only once

但是来自@Dmitry Egorov的这个解决方案更加优雅,我仍然在努力解决它(如果可以通过一个正则表达式来解决) 我得到的最接近的是

^(.|\n)*<\[SG (?!.*(.).*\2)[msbrelft]+\]>(.|\n)*$

文本应匹配如下

ID-CFI Location 02h displays sector protection status for the sector selected by the sector address (SA) used in the ID-CFI enter
command. To read the protection status of more than one sector it is necessary to exit the ID ASO and enter the ID ASO using the
new SA. <[SG sbl]>
Page mode read between ID locations other than 02h is supported.

我在C#中使用此检查

if (!Regex.IsMatch(obj.Object_Text, format.Value))
...
...

用文字说,匹配应该是:

- if this exists anywhere in text <[SG sbl]> including over \n or \r\n
- letters should be in this group of letters [msbrelft]
- must be minimum one letter, eg. <[SG s]>
- can be up to all from group, eg. <[SG sbl]>
- must be only one letter (no duplicates), eg. <[SG sbsl]> is NOT good

我不想提取组,只需要验证所有文本,如果包含&lt; [SG xx ..]&gt;以前解释过的规则。

现在我已经出现并让我疯狂了,是

^(.|\n)*<\[SG (?!.*(.).*\2)[msbrelft]+\]>(.|\n)*$

如果在我感兴趣的小组之后有相同的两个字母(没有\ r \ n或\ n),则不会验证。

因此,例如,这有效(在组之后有\ n或\ r \ n)

ID-CFI Location 02h displays sector protection status for the sector selected by the sector address (SA) used in the ID-CFI enter
command. To read the protection status of more than one sector it is necessary to exit the ID ASO and enter the ID ASO using the
new SA. <[SG sbl]>
Page mode read between ID locations other than 02h is supported.

而这不是(我小组之后的两个空格)

ID-CFI Location 02h displays sector protection status for the sector selected by the sector address (SA) used in the ID-CFI enter
command. To read the protection status of more than one sector it is necessary to exit the ID ASO and enter the ID ASO using the
new SA. <[SG sbl]>  Page mode read between ID locations other than 02h is supported.

任何帮助将不胜感激! 谢谢。

2 个答案:

答案 0 :(得分:1)

首先,如果您只想在规则中找到一个<SG xxx>来验证字符串,则不需要在模式中描述完整的字符串。

你的模式的问题是你的负向前瞻可以检查方括号分隔子字符串之外的字符,以避免你需要用排除方括号的负字符类来改变点的问题:

<\[SG (?![^\]]*([^\]])[^\]]*\1)[msbrelft]+\]>

您也可以这样写:

<\[SG (?:([msbrelft])(?![^\]]*?\1))+\]>

答案 1 :(得分:1)

(.|\n)*替换[\S\s]*似乎有效 \ S:任何不是空白的东西
\ s:空格,制表符,换行符......

^[\S\s]*<\[SG (?!\w*(\w)\w*\1)[beflmrst]+\]>[\S\s]*$

此外,避免重复的否定前瞻现在使用的是\w,而不是.
由于]不是一个字字符,因此它不会超越它 \ w:单词字符。

或者,就像Wiktor指出的那样,将RegexOptions.Singleline传递给正则表达式构造函数,正则表达式可以高尔夫编码为:

^.*<\[SG (?!\w*(\w)\w*\1)[beflmrst]+\]>.*$

无论如何,从另一个答案我注意到你真的只想搜索那个SG标签,而不是在它包含标签的情况下获取整个文本。

所以最后,这样做:

<\[SG (?!\w*(\w)\w*\1)[beflmrst]+\]>