我想在我们的论坛上获取div标签之间的所有内容以在程序中处理它们,获取的页面如下:
<div id="post_message_1234567">
<a href="http://blahblah.com" target="_blank"><img src="http://blahblah.com/iuhiuhuh.gif" border="0" alt="" /></a> <br />
<br />
jofjhoeifjoiwefjoweifj<br />
blahblahblahpokpoekpfowef<br />
<br />
khfiudhfisduhfiusdfh<br />
<br />
<a href="http://blah.com/img.php?image=trepazoid.jpg" target="_blank"><img src="http://blah.com/loc367/euhfwieufhwifuhiwefuh.jpg" border="0" alt="" /></a><br />
<br />
one<br />
two*three<br />
87879879 nuts<br />
11 bananas<br />
<br />
<a href="hjoiwjhfoweif.dat" target="_blank">Monkeys</a>
</div>
我试过这个正则表达式代码,但它没有帮助:
string find = "\\b<div id=\"post_message_\\d+\">\\n*.*</div>\\b";
您能否帮我解决<div id="post_message_1234567">
和</div>
之间的所有问题?
答案 0 :(得分:1)
这个怎么样:
@"<div id=""post_message_\d+"">(?<Content>(\r|\n|.)*)</div>"
示例:
string searchString = @"<div id=""post_message_1234567"">
<a href=""http://blahblah.com"" target=""_blank""><img src=""http://blahblah.com/iuhiuhuh.gif"" border=""0"" alt="""" /></a> <br />
<br />
jofjhoeifjoiwefjoweifj<br />
blahblahblahpokpoekpfowef<br />
<br />
khfiudhfisduhfiusdfh<br />
<br />
<a href=""http://blah.com/img.php?image=trepazoid.jpg"" target=""_blank""><img src=""http://blah.com/loc367/euhfwieufhwifuhiwefuh.jpg"" border=""0"" alt="""" /></a><br />
<br />
one<br />
two*three<br />
87879879 nuts<br />
11 bananas<br />
<br />
<a href=""hjoiwjhfoweif.dat"" target=""_blank"">Monkeys</a>
</div>";
Regex regex = new Regex(@"<div id=""post_message_\d+"">(?<Content>(\r|\n|.)*)</div>");
Match match = regex.Match(searchString);
bool success = match.Success; // True
string content = match.Groups["Content"].Value;
content
现在包含您想要的标记之间的所有内容。