我有一个计算机生成的文本,如下所示(我修改了空白区域,使其在眼睛上更舒适)。
<li class="activitybit forum_post">
<div class="avatar">
<img src="image.php?s=64ca7b4cc0fa2850f6c763105eee901b&u=37080&dateline=1396817868&type=thumb" alt="killathi's Avatar" />
</div>
<div class="content hasavatar">
<div class="datetime">
<span class="date">Today, <span class="time">07:14 PM</span></span>
</div>
<div class="title">
<a href="member.php?37080-killathi&s=64ca7b4cc0fa2850f6c763105eee901b">killathi</a> replied to a thread <a href="showthread.php?1016907-doodles!-Maybe-I-won-t-have-lines-in-it-this-time!!!-MUAHAHHAHAHAAHAH&s=64ca7b4cc0fa2850f6c763105eee901b">doodles! Maybe I won't have lines in it this time!!! MUAHAHHAHAHAAHAH</a> in <a href="forumdisplay.php?208-Fan-Creations&s=64ca7b4cc0fa2850f6c763105eee901b">Fan Creations</a>
</div>
<div class="excerpt">I'll hold this one here for now I guess, not really sure where to go with it lol</div>
<div class="fulllink"><a href="showthread.php?1016907-doodles!-Maybe-I-won-t-have-lines-in-it-this-time!!!-MUAHAHHAHAHAAHAH&s=64ca7b4cc0fa2850f6c763105eee901b&p=9844450#post9844450">see more</a></div>
</div>
<div class="views">77 replies | 3407 view(s)</div>
</li>
我使用了正则表达式:(?:<div class=\"title\">)((?:[\s\S]*?))(?:</div>)
我在第一个未被忽略的组中提取了以下内容:
<a href="member.php?37080-killathi&s=64ca7b4cc0fa2850f6c763105eee901b">killathi</a> replied to a thread <a href="showthread.php?1016907-doodles!-Maybe-I-won-t-have-lines-in-it-this-time!!!-MUAHAHHAHAHAAHAH&s=64ca7b4cc0fa2850f6c763105eee901b">doodles! Maybe I won't have lines in it this time!!! MUAHAHHAHAHAAHAH</a> in <a href="forumdisplay.php?208-Fan-Creations&s=64ca7b4cc0fa2850f6c763105eee901b">Fan Creations</a>
但是,我想知道它是否可能(以及如果是这样的话)如何使用正则表达式排除三角括号内的所有内容。
我知道我需要在((?:[\s\S]*?))
做一些事情,但我不确定该怎么做。
(可以安全地假设所有文本都采用这种格式)。
答案 0 :(得分:2)
要替换三角括号内的所有内容,只需使用此正则表达式:
<[^>]*>
像这样:
string output = Regex.Replace(input, "<[^>]*>", "");
答案 1 :(得分:2)
我建议您使用此库:HTML Agility Pack
您可以像下面这样简单地提取文字:
var doc = new HtmlDocument();
doc.LoadHtml(yourHtml);
var node = doc.DocumentNode.SelectSingleNode("//div[@class='title']");
string result = node.InnerText;
答案 2 :(得分:1)
我认为RegEx Replace可能会这样做,但在一般情况下使用regex来操作html是非常困难的。以下是a fiddle,其中演示了(<.+?>)
的使用。它适用于你的例子,但我不保证!