从设置超过10000行的文本中,我需要找到所有字符串实例,其中缺少一组html标记之后的空格。 HTML标记集受限制,如下所示。
<b> </b>, <em> </em>, <span style="text-decoration: underline;" data-mce-style="text-decoration: underline;"> </span>
<sub> </sub>, <sup> </sup>, <ul> </ul>, <li> </li>, <ol> </ol>
运行Regx后,字符串应该出现在结果中。
Hi <b>all</b>good morning.
在这种情况下,我们在粗体标记后错过了sapce。
答案 0 :(得分:3)
假设C#:
StringCollection resultList = new StringCollection();
Regex regexObj = new Regex("^.*<(?:/?b|/?em|/?su[pb]|/?[ou]l|/?li|span style=\"text-decoration: underline;\" data-mce-style=\"text-decoration: underline;\"|/span)>(?! ).*$", RegexOptions.Multiline);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
resultList.Add(matchResult.Value);
matchResult = matchResult.NextMatch();
}
将返回文件中所有行,其中列表中的一个标记后面至少有一个空格。
输入:
This </b> is <b> OK
This <b> is </b>not OK
Neither <b>is </b> this.
输出:
This <b> is </b>not OK
Neither <b>is </b> this.
<强>解释强>
^ # Start of line
.* # Match any number of characters except newlines
< # Match a <
(?: # Either match a...
/?b # b or /b
| # or
/?em # em or /em
|... # etc. etc.
) # End of alternation
> # Match a >
(?! ) # Assert that no space follows
.* # Match any number of characters until...
$ # End of line