<a href="||blablabla link||" title="||blablabla title of torrent|| torrent">||THE STRING THAT IM INTERESTED IN--NAMES||</a>
我正在处理包含20-30条上述格式行的html文件!我有兴趣将所有NAMES保存在数组列表中。我的问题是,我无法理解正则表达式格式以获得每个名称 我应该使用什么样的模式?如何使用此模式捕获此html字符串中的每个名称? 谢谢!
答案 0 :(得分:0)
<div class="container">
<div class="header">
<div class="navbar">
<ul>
<li>HOME</li>
<li>GALLERY</li>
<li>EVENTS</li>
<li>SHOP</li>
<li>ABOUT</li>
</ul>
</div>
</div>
<div class="eventbar">
<div class="events">
<article>
<div class="image"><img class="icono" src="http://placehold.it/120x120">
</div>
<div class="text">
<h1 style="margin-bottom:-20px;">Event 1</h1>
<p>this is this is placeholder text websites are fun and i like to make them. although they are freakin </p>
</div>
</article>
</div>
<div class="newimages">
<h1 class="imgtext">This is a catchy tagline</h1>
<div><img class="r-image"src="http://placehold.it/800x400">
</div>
<p>this image is about yada yada and it was featured on yada yada. and now i would like to formally present it to you the aeophex family</p>
</div>
这是一个例子,我想你的dom的标题必须以string html = @"<a href=""/torrent/4353486/Terminator+Genisys+2015+720p+WEBRip+%5BChattChitto+RG%5D.html"" title=""view Terminator Genisys 2015 720p WEBRip [ChattChitto RG] torrent"">Terminator Genisys 2015 720p WEBRip [ChattChitto RG]</a>";
string patten = @"<a\s+href=""[^""]*""\s+title=""[^""]*torrent"".*?>([^<]*)</a>";
foreach (Match m in Regex.Matches(html, patten, RegexOptions.IgnoreCase))
{
Console.WriteLine(m.Groups[1].Value);
}
结尾