C# - HTML Agility Pack - 节点集合 - 循环表行,选择nth-child(2)

时间:2014-09-26 19:23:01

标签: c# html-agility-pack

我正在使用htmlAgilityPack,并从网站上抓取一张桌子。

如何修改此值以返回每行,每隔一列的值。

public static void SearchAnimal(string param)
        {
            string prm = param;
            string url = "http://xxx/xxx.action?name=";
            //HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url+prm);
            //HttpWebResponse response = (HttpWebResponse)request.GetResponse();
            //StreamReader stream = new StreamReader(response.GetResponseStream());
            //string final_response = stream.ReadToEnd();
            var webGet = new HtmlWeb();
            var doc = webGet.Load(url + prm);

            HtmlNodeCollection tr = doc.DocumentNode.SelectNodes("//table[@id='animal']//tbody//tr//td");

                  for(int i = 0; i <= tr.Count; ++i){
                    var link = tr
                       .Descendants("a")
                       .First(x => x.Attributes["href"] != null);
                    string hrefValue = link.Attributes["href"].Value;
                    string name = link.InnerHtml;
                    Match match = Regex.Match(hrefValue, @"(\d+)$");
                    Console.ForegroundColor = ConsoleColor.DarkGray;
                    Console.WriteLine("Result " + tr + ":");
                    Console.ForegroundColor = ConsoleColor.Gray;
                    Console.WriteLine("Animal Name: " + name);
                    Console.WriteLine("Animal Key: " + match.Value);
                    Console.WriteLine("-------------------------");
                    Console.WriteLine("");

                       }



        }

1 个答案:

答案 0 :(得分:1)

您可以使用XPath位置过滤器从每个<td>中仅获取第二个<tr>子项:

//table[@id='animal']//tbody//tr/td[2]

它实际上等于CSS :nth-of-type()选择器,并且只有当所有子节点属于同一类型时才显示与:nth-child()相同的输出(在这种情况下,所有子节点均为<td>