从HTML字符串中获取一些链接

时间:2017-01-28 14:50:42

标签: c#

我有一些像这样的内容的字符串

<a href="http://example.com/2014/06/22/new-idea-about-life.zip">One</a>
<a href="http://example.com/2014/06/22/new-idea-about-life-rar.rar">Two</a>

我需要这个输出:

http://example.com/2014/06/22/new-idea-about-life.zip
http://example.com/2014/06/22/new-idea-about-life-rar.rar

1 个答案:

答案 0 :(得分:0)

HTML Agility Pack是一个很好的库来解析C#中的HTML。

提取网址的示例是:

var html = "<a href=\"http://reallife.com/2014/06/22/new-idea-about-life.zip\">New idea about life (zip) (25MB)</a><a href=\"http://reallife.com/2014/06/22/new-idea-about-life-rar.rar\">New idea about life (rar) (23MB)</a>
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var links = new List<string>();
foreach (var link in htmlDoc.DocumentNode.SelectNodes("//a[@href]"))
{
    links.Add(link.GetAttributeValue("href", string.Empty));    
}
// do something with the links inside the links-List