Question

这是我的C＃代码，我想要做的是使用HtmlAgilityPack从网站上抓取数据，但是每次都不知道我在做什么时，它什么也没发现

HtmlAgilityPack.HtmlWeb webb = new HtmlAgilityPack.HtmlWeb();
        ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;

        HtmlAgilityPack.HtmlDocument doc = webb.Load("mywebsite");


        HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//ul[@class='unstyled']//li//a");

       if (nodes != null)
       {
           foreach (HtmlNode n in nodes)
           {
               q = n.InnerText;
               q = System.Net.WebUtility.HtmlDecode(q);
               q = q.Trim();
               Console.WriteLine(q);
           }

       }
       else
       {
           Console.WriteLine("nothing found");
       }

Here is the picture of the tag，我试图从中捕获数据，我需要来自<a>标签的数据。

Answer 1

用于选择标签的XPath不正确。

HtmlNodeCollection nodes = 
doc.DocumentNode.SelectNodes("//ul[@class='unstyled']/li/a");

这应该选择所有锚点节点，然后您可以遍历节点以获取InnerHtml。

下面显示的工作示例

string s = "<ul class='unstyle no-overflow'><li><ul class='unstyled'><li><a href='http://www.smsconnexion.com'>SMS ConneXion</a></li></ul><ul class='unstyled'><li><a href='http://www.celusion.com'>Celusion</a></li></ul></li></ul>";


HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(s);

HtmlNodeCollection nodes = 
doc.DocumentNode.SelectNodes("//ul[@class='unstyled']/li/a");

foreach(var node in nodes)
{
    Console.WriteLine(node.Attributes["href"].Value);
}

Console.ReadLine();

使用HtmlAgilityPack抓取数据以获取没有类的标签

1 个答案: