XPath C#如何获取由tr分组的表中的所有td

时间:2018-03-23 23:50:01

标签: c# xpath

我在C#中使用XPath从表中提取所有信息: http://es.fifa.com/worldcup/archive/brazil2014/statistics/players/goal-scored.html

有什么方法可以提取所有由tr组成的tds?

我希望能够像这样访问它们:

for (int x = 0; x < rows.count; x++)
{
    for (int y = 0; y < rows[x].cells.count; y++)
    {
          //Print them here or add them to an array
    }
}

如何做到这一点?

1 个答案:

答案 0 :(得分:1)

该网页似乎不是一个有效的xml文档,因此很难将其轻松解析为XmlDocument和XPath。使用Html Agility Pack ...

会容易得多
using (WebClient client = new WebClient())
{
  var url = "http://es.fifa.com/worldcup/archive/brazil2014/statistics/players/goal-scored.html";
  var web = new HtmlWeb();
  var doc = web.Load(url);

  var table = doc.DocumentNode.Descendants().Where(dn => dn.HasClass("tbl-statistics")).FirstOrDefault();

  var cells = table.SelectNodes("//tbody/tr/td");

  var cellsGroupedByTr = cells.GroupBy(c => c.ParentNode);

  foreach (var group in cellsGroupedByTr)
  {
    var tr = group.Key;
    var trCells = group.ToArray();

    var cellStrings = trCells.Select(c => c.InnerText).ToArray();
    Console.WriteLine(string.Join(", ", cellStrings));

  }
}

哪些输出......

James RODRIGUEZ, 5, 399, 6, 2, 1, 4, 1, 1
Thomas MUELLER, 7, 682, 5, 3, 1, 1, 4, 0
Neymar, 5, 457, 4, 1, 1, 1, 3, 0
Lionel MESSI, 7, 693, 4, 1, 0, 4, 0, 0
Robin VAN PERSIE, 6, 548, 4, 0, 1, 3, 0, 1
etc ...