我正在尝试使用HTML agility pack让我的程序读取文件并从中获取所有图像srcs。这是我到目前为止所得到的:
private ArrayList GetImageLinks(String html,String link)
{
//link = url of webpage
//html = a string of the html, just for testing will remove after
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
htmlDoc.Load(link);
List<String> imgs = (from x in htmlDoc.DocumentNode.Descendants()
where x.Name.ToLower() == "img"
select x.Attributes["src"].Value).ToList<String>();
Console.Out.WriteLine("Hey");
ArrayList imageLinks = new ArrayList(imgs);
foreach (String element in imageLinks)
{
Console.WriteLine(element);
}
return imageLinks;
}
这是我得到的错误: System.ArgumentException:不支持URI格式。
答案 0 :(得分:6)
HtmlDocument docHtml = new HtmlWeb().Load(url);