Question

<a href="||blablabla link||" title="||blablabla title of torrent|| torrent">||THE STRING THAT IM INTERESTED IN--NAMES||</a>

我正在处理包含20-30条上述格式行的html文件！我有兴趣将所有NAMES保存在数组列表中。我的问题是，我无法理解正则表达式格式以获得每个名称我应该使用什么样的模式？如何使用此模式捕获此html字符串中的每个名称？谢谢！

Answer 1

<div class="container">
<div class="header">
    <div class="navbar">
    <ul>
    <li>HOME</li>
    <li>GALLERY</li>
    <li>EVENTS</li>
    <li>SHOP</li>
    <li>ABOUT</li>
    </ul>
    </div>
</div>
<div class="eventbar">
    <div class="events">
        <article>
            <div class="image"><img class="icono" src="http://placehold.it/120x120">
            </div>
            <div class="text">
            <h1 style="margin-bottom:-20px;">Event 1</h1>
            <p>this is this is placeholder text websites are fun and i like to make them. although they are freakin </p>
            </div>
        </article>
    </div>

<div class="newimages">
    <h1 class="imgtext">This is a catchy tagline</h1>
    <div><img class="r-image"src="http://placehold.it/800x400">
    </div>
    <p>this image is about yada yada and it was featured on yada yada. and now i would like to formally present it to you the aeophex family</p>
</div>

这是一个例子，我想你的dom的标题必须以string html = @"<a href=""/torrent/4353486/Terminator+Genisys+2015+720p+WEBRip+%5BChattChitto+RG%5D.‌html"" title=""view Terminator Genisys 2015 720p WEBRip [ChattChitto RG] torrent"">Terminator Genisys 2015 720p WEBRip [ChattChitto RG]</a>"; string patten = @"<a\s+href=""[^""]*""\s+title=""[^""]*torrent"".*?>([^<]*)</a>"; foreach (Match m in Regex.Matches(html, patten, RegexOptions.IgnoreCase)) { Console.WriteLine(m.Groups[1].Value); }结尾

C＃尝试使用正则表达式从html中隔离名称

1 个答案: