C#HtmlAgilityPack获取表格

时间:2016-03-28 18:12:40

标签: c# html-agility-pack

我只是想了解HTMLAgilityPack。我想从全表中获取数据? imdb网站中的(演员姓名和角色名称列表)

样品薄膜; http://www.imdb.com/title/tt0482571

元素:http://prntscr.com/al6jc9

我识别元素。如何使用GridControl对象将此数据传输到DevExpress?我在gridcontrol中有两列。 (演员,角色)

我目前有以下代码;

DataTable dt = new DataTable();
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
var _gts = doc.DocumentNode.SelectNodes("//table[@id='cast_list']tbody//tr//actor");
foreach (var item in _gts)
{
  var s = item.Elements("name").ToList();
  foreach (var item2 in s)
  {
     dd.Add(item2.InnerText)
  }
}

我希望得到这样的结果;

prntscr.com/al7b83

1 个答案:

答案 0 :(得分:0)

我相信你现在可能已经弄明白了,但为了完整起见:

    using HtmlAgilityPack;
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Net;
    using System.Text;
    using System.Threading.Tasks;

    namespace ConsoleApplication1
    {
        class Program
        {
            static void Main(string[] args)
            {
                WebClient wc = new WebClient();
                string html = wc.DownloadString("http://www.imdb.com/title/tt0482571/");

                HtmlDocument doc = new HtmlDocument();
                doc.LoadHtml(html);
                var castListRows = doc.DocumentNode.SelectNodes("//table[@class='cast_list']/tr");
                foreach (var castListRow in castListRows)
                {
                    var nameNode = castListRow.Descendants().Where(n => n.Attributes.Contains("itemprop") && n.Attributes["itemprop"].Value == "name").FirstOrDefault();
                    if (nameNode != null)
                    {
                        var characterCell = castListRow.CreateNavigator().Select("td[@class='character']/div/a");
                        if (characterCell.MoveNext())
                        {
                            Console.WriteLine("Actor={0}, Character={1}", nameNode.InnerText, characterCell.Current.InnerXml);
                        }                   
                    }
                }
                Console.ReadKey();
            }
        }
    }