Question

我想在<a>文档中的HTML标记中检索网址。这是标签：

<a href="index.php?option=com_remository&amp;Itemid=43&amp;func=fileinfo&amp;id=49"><img src="http://dziekanat.wzim.sggw.pl/components/com_remository/images/file_icons/New.gif" width="16" height="16" border="0" align="middle" alt="file_icons/New.gif"/><b>&nbsp;Plan STAC lato 2014_15</b></a>

解析后我应该

index.php?option=com_remository&Itemid=43&func=fileinfo&id=49

我应该使用什么样的正则表达式？

我想用正则表达式来做这件事，因为HTML文档本身很老，并且没有任何ID可供引用。因此，我无法使用任何更复杂的工具（如Html Agility Pack）。

整个文档可以在这里找到：http://dziekanat.wzim.sggw.pl/index.php?option=com_remository&Itemid=43&func=select&id=2

Answer 1

因此，我无法使用任何更复杂的工具（如Html Agility Pack）。

为什么不呢？这对我有用

Self

此Xpath返回您的链接

var html = new Webclient().DownloadString("http://dziekanat.wzim.sggw.pl/index.php?option=com_remository&Itemid=43&func=select&id=2");
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);


var links = doc.DocumentNode.Descendants("a")
            .Select(a => a.Attributes["href"].Value)
            .ToList();

Answer 2

你走了：

string Pattern = @"<a[^>]*?href\s*=\s*[""']?([^'"" >]+?)[ '""][^>]*?>";

如何解析锚标记中的URL

2 个答案: