如何获取html标签之间的所有字符

时间:2013-11-08 19:02:38

标签: c# html regex

我想在我们的论坛上获取div标签之间的所有内容以在程序中处理它们,获取的页面如下:

<div id="post_message_1234567">

        <a href="http://blahblah.com" target="_blank"><img src="http://blahblah.com/iuhiuhuh.gif" border="0" alt="" /></a> <br />
<br />
jofjhoeifjoiwefjoweifj<br />
 blahblahblahpokpoekpfowef<br />
<br />
khfiudhfisduhfiusdfh<br />
<br />
<a href="http://blah.com/img.php?image=trepazoid.jpg" target="_blank"><img src="http://blah.com/loc367/euhfwieufhwifuhiwefuh.jpg" border="0" alt="" /></a><br />
<br />
one<br />
 two*three<br />
 87879879 nuts<br />
 11 bananas<br />
<br />
<a href="hjoiwjhfoweif.dat" target="_blank">Monkeys</a>
        </div>

我试过这个正则表达式代码,但它没有帮助:

string find = "\\b<div id=\"post_message_\\d+\">\\n*.*</div>\\b";

您能否帮我解决<div id="post_message_1234567"></div>之间的所有问题?

1 个答案:

答案 0 :(得分:1)

这个怎么样:

@"<div id=""post_message_\d+"">(?<Content>(\r|\n|.)*)</div>"

示例:

string searchString = @"<div id=""post_message_1234567"">

        <a href=""http://blahblah.com"" target=""_blank""><img src=""http://blahblah.com/iuhiuhuh.gif"" border=""0"" alt="""" /></a> <br />
<br />
jofjhoeifjoiwefjoweifj<br />
 blahblahblahpokpoekpfowef<br />
<br />
khfiudhfisduhfiusdfh<br />
<br />
<a href=""http://blah.com/img.php?image=trepazoid.jpg"" target=""_blank""><img src=""http://blah.com/loc367/euhfwieufhwifuhiwefuh.jpg"" border=""0"" alt="""" /></a><br />
<br />
one<br />
 two*three<br />
 87879879 nuts<br />
 11 bananas<br />
<br />
<a href=""hjoiwjhfoweif.dat"" target=""_blank"">Monkeys</a>
        </div>";
Regex regex = new Regex(@"<div id=""post_message_\d+"">(?<Content>(\r|\n|.)*)</div>");
Match match = regex.Match(searchString);
bool success = match.Success; // True
string content = match.Groups["Content"].Value;

content现在包含您想要的标记之间的所有内容。