C#尝试使用正则表达式从html中隔离名称

时间:2015-08-13 23:04:18

标签: c# html regex title

<a href="||blablabla link||" title="||blablabla title of torrent|| torrent">||THE STRING THAT IM INTERESTED IN--NAMES||</a>

我正在处理包含20-30条上述格式行的html文件!我有兴趣将所有NAMES保存在数组列表中。我的问题是,我无法理解正则表达式格式以获得每个名称 我应该使用什么样的模式?如何使用此模式捕获此html字符串中的每个名称? 谢谢!

1 个答案:

答案 0 :(得分:0)

<div class="container">
<div class="header">
    <div class="navbar">
    <ul>
    <li>HOME</li>
    <li>GALLERY</li>
    <li>EVENTS</li>
    <li>SHOP</li>
    <li>ABOUT</li>
    </ul>
    </div>
</div>
<div class="eventbar">
    <div class="events">
        <article>
            <div class="image"><img class="icono" src="http://placehold.it/120x120">
            </div>
            <div class="text">
            <h1 style="margin-bottom:-20px;">Event 1</h1>
            <p>this is this is placeholder text websites are fun and i like to make them. although they are freakin </p>
            </div>
        </article>
    </div>

<div class="newimages">
    <h1 class="imgtext">This is a catchy tagline</h1>
    <div><img class="r-image"src="http://placehold.it/800x400">
    </div>
    <p>this image is about yada yada and it was featured on yada yada. and now i would like to formally present it to you the aeophex family</p>
</div>

这是一个例子,我想你的dom的标题必须以string html = @"<a href=""/torrent/4353486/Terminator+Genisys+2015+720p+WEBRip+%5BChattChitto+RG%5D.‌​html"" title=""view Terminator Genisys 2015 720p WEBRip [ChattChitto RG] torrent"">Terminator Genisys 2015 720p WEBRip [ChattChitto RG]</a>"; string patten = @"<a\s+href=""[^""]*""\s+title=""[^""]*torrent"".*?>([^<]*)</a>"; foreach (Match m in Regex.Matches(html, patten, RegexOptions.IgnoreCase)) { Console.WriteLine(m.Groups[1].Value); } 结尾