我在C#中使用XPath从表中提取所有信息: http://es.fifa.com/worldcup/archive/brazil2014/statistics/players/goal-scored.html
有什么方法可以提取所有由tr组成的tds?
我希望能够像这样访问它们:
for (int x = 0; x < rows.count; x++)
{
for (int y = 0; y < rows[x].cells.count; y++)
{
//Print them here or add them to an array
}
}
如何做到这一点?
答案 0 :(得分:1)
该网页似乎不是一个有效的xml文档,因此很难将其轻松解析为XmlDocument和XPath。使用Html Agility Pack ...
会容易得多using (WebClient client = new WebClient())
{
var url = "http://es.fifa.com/worldcup/archive/brazil2014/statistics/players/goal-scored.html";
var web = new HtmlWeb();
var doc = web.Load(url);
var table = doc.DocumentNode.Descendants().Where(dn => dn.HasClass("tbl-statistics")).FirstOrDefault();
var cells = table.SelectNodes("//tbody/tr/td");
var cellsGroupedByTr = cells.GroupBy(c => c.ParentNode);
foreach (var group in cellsGroupedByTr)
{
var tr = group.Key;
var trCells = group.ToArray();
var cellStrings = trCells.Select(c => c.InnerText).ToArray();
Console.WriteLine(string.Join(", ", cellStrings));
}
}
哪些输出......
James RODRIGUEZ, 5, 399, 6, 2, 1, 4, 1, 1
Thomas MUELLER, 7, 682, 5, 3, 1, 1, 4, 0
Neymar, 5, 457, 4, 1, 1, 1, 3, 0
Lionel MESSI, 7, 693, 4, 1, 0, 4, 0, 0
Robin VAN PERSIE, 6, 548, 4, 0, 1, 3, 0, 1
etc ...