无法摆脱非捕获正则表达式组

时间:2015-02-27 01:23:21

标签: c# html .net regex

我有以下字符串:

In order to take this course, you must:<br>
<br>
&radic; &nbsp; &nbsp;Have access to a computer.<br>
<br>
&radic; &nbsp; &nbsp;Have continuous broadband Internet access.<br>
<br>
&radic; &nbsp; &nbsp;Have the ability/permission to install plug-ins (e.g. Adobe Reader or Flash) and software.<br>
<br>
&radic; &nbsp; &nbsp;Have the ability to download and save files and documents to a computer.<br>
<br>
&radic; &nbsp; &nbsp;Have the ability to open Microsoft file and documents (.doc, .ppt, .xls, etc.).<br>
<br>
&radic; &nbsp; &nbsp;Be competent in the English language.<br>
<br>
&radic; &nbsp; &nbsp;Have access to a relational database management system.&nbsp; A good open-source option is MySQL (<a href="http://dev.mysql.com" target="_blank">dev.mysql.com</a>).<br>
<br>
&radic; &nbsp; &nbsp;Have completed the Discrete Structures course.<br>
<br>
&radic;&nbsp;&nbsp;&nbsp; Have read the Student Handbook.

我试图选择中间的文字(不包括标题,编码空格和<br>),例如,第一场比赛应为:Have access to a computer.

我已经尝试了以下两项,但无法使其发挥作用。

这一行选择整行:^(?:&radic;([(&nbsp;)|\s]*))(.*)(?:(\<br\\?\>)*)$,我尝试拨打Regex.Matches(requirements.InnerHtml, RequirementsExtractorRegex, RegexOptions.Multiline)[0].Captures[0].Value,此处的值为:&radic; &nbsp; &nbsp;Have access to a computer.<br>

这个没有选择任何东西:^(?<=&radic;([(&nbsp;)|\s]*))(.*)(?=(\<br\\?\>)*)$

我做错了什么?

1 个答案:

答案 0 :(得分:1)

对正则表达式进行略微修改会产生(几乎,见下文)所需的结果

^(?:&radic;(?:&nbsp;|\s)*)(.*)(?:<br/?>)

引用组#1中的目标匹配

Regex.Matches(requirements.InnerHtml, RequirementsExtractorRegex, RegexOptions.Multiline)[0].Groups[1].Value

使用多行匹配选项在regexstorm上进行了测试。

<强> 买者

由于非可选的br元素,正则表达式匹配所有目标出现但最后一个出现。量化该部分包括匹配中的最后一次出现但使得捕获组#1包含终止该行的br元素 - 贪婪的通用匹配覆盖。添加行终止锚点会阻止匹配(虽然它不应该在我对规范的理解中 - 可能是测试环境的工件?)。