Question

我正在尝试学习如何从URL获取所有img src。但是，我的代码中的imgs变量始终为null。我做错了什么？

static void Main(string[] args)
{
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml("http://archive.ncsa.illinois.edu/primer.html");
    HtmlAgilityPack.HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img");

    if (imgs != null)
    {
        foreach (HtmlAgilityPack.HtmlNode img in imgs)
        {
            string imgSrc = img.Attributes["src"].Value;
        }
    }

    Console.ReadKey();
}

Answer 1

您使用的是HtmlDocument.LoadHtml，它设计用于获取html源而非网址。

您可以使用WebClient获取html，例如

WebClient wc = new WebClient();
string html = wc.DownloadString("http://archive.ncsa.illinois.edu/primer.html");
doc.LoadHtml(html);

HtmlDocument还支持允许从各种其他来源加载内容的Load。

c＃HtmlAgility Pack - 无法获取图像src

1 个答案: