I have a string:
Test.
<div>
<table style="color:blue;"><tbody><!--START SPACE COMMENTS SUMMARY-->
<tr><td colspan="2">SPACE COMMENTS SUMMARY</td></tr>
<tr><td style="min-width:200px;">Area/Room</td>
<td style="max-width:300px;text-align:left;">Comments</td>
</tr><tr><td style="min-width:200px;">Bathroom</td>
<td style="max-width:300px;text-align:left;">Some comment</td></tr>
<!--END SPACE COMMENTS SUMMARY--></tbody></table>
<div>
<table style="color:blue;"><tbody><!--START SPACE SUMMARY-->
<tr><td colspan="2">SPACE SUMMARY</td></tr><tr>
<td style="min-width:200px;">Space</td>
<td style="max-width:300px;text-align:right;">Installed Price</td></tr>
<tr><td style="min-width:200px;">Bathroom</td>
<td style="max-width:300px;text-align:right;">$2,355.97</td></tr>
<!--END SPACE SUMMARY--></tbody></table>
<br><br><br><div>Some text.</div></div></div>
I want to select with regex a table that has comments <!--START SPACE SUMMARY>
and <!--END SPACE SUMMARY-->
.
I tried with @"<table.*?><tbody.*?><!--START SPACE SUMMARY>.*?<!--END SPACE SUMMARY--></tbody></table>"
, but it selects both tables in the string.
EDIT: My question doesn't have to do precisely with HTML. The same question will stand if I had a string:
some text blah blah one some text blah blah two.
And I want to select some text blah blah two
with a pattern some text.*?two
.
答案 0 :(得分:1)
string test = @"Test.
<div>
<table style=""color:blue;""><tbody><!--START SPACE COMMENTS SUMMARY-->
<tr><td colspan=""2"">SPACE COMMENTS SUMMARY</td></tr>
<tr><td style=""min-width:200px;"">Area/Room</td>
<td style=""max-width:300px;text-align:left;"">Comments</td>
</tr><tr><td style=""min-width:200px;"">Bathroom</td>
<td style=""max-width:300px;text-align:left;"">Some comment</td></tr>
<!--END SPACE COMMENTS SUMMARY--></tbody></table>
<div>
<table style=""color:blue;""><tbody><!--START SPACE SUMMARY-->
<tr><td colspan=""2"">SPACE SUMMARY</td></tr><tr>
<td style=""min-width:200px;"">Space</td>
<td style=""max-width:300px;text-align:right;"">Installed Price</td></tr>
<tr><td style=""min-width:200px;"">Bathroom</td>
<td style=""max-width:300px;text-align:right;"">$2,355.97</td></tr>
<!--END SPACE SUMMARY--></tbody></table>
<br><br><br><div>Some text.</div></div></div>";
MatchCollection matches = Regex.Matches(test, @"<table(?!.*<table).*?<!--START SPACE SUMMARY-->.*?<!--END SPACE SUMMARY-->.*?table>", RegexOptions.Singleline);
The idea is to use (?!.*<table)
to tell Regex engine the the text within should not contain another table anchor.
答案 1 :(得分:1)
让我们关注您遇到的非HTML问题:匹配两个分隔符之间的最近窗口。使用tempered greedy token:
(?s)some text(?:(?!some text|two).)*two
|<-1st->||<----TG Token ------>||
|2nd delimiter
请参阅regex demo
对于HTML解析,使用HtmlAgilityPack,这将使每个维护代码的人的生活更轻松。
当(?s)
匹配包含换行符的任何字符时.
启用DOTALL模式,(?:(?!some text|two).)*
淬火贪婪令牌将匹配任何不是some text
的起始字符的字符或two
文字字符序列。
答案 2 :(得分:0)
Try this:
<table.*?><tbody.*?><!--START (SPACE SUMMARY)>.*?<!--END \1--><\/tbody><\/table>
It should be done with non-greedy, but I try to use variable \1
here to repeat group 1 value. And also escape the /
to \/
. Maybe that's the problem source.