如何使用HtmlAgilityPack将评论之间的表格刮到网格视图中?

时间:2019-01-30 05:05:14

标签: c# gridview web-scraping datatable html-agility-pack

当前正在尝试使用htmlagilitypack将表从this page刮到gridview中。我相信我的代码已经成功地从注释之间拉出了表格,但是当它要建立数据表时,它说它找不到第8列,在这种情况下显然不应该存在第8列。我对此有些陌生,非常感谢您对我做错的事情的解释

private void GetTeamStats()
{
    var webGet = new HtmlWeb();
    var getPage = webGet.Load("https://www.teamrankings.com/nba/stat/effective-field-goal-pct");
    var commentNode = getPage.DocumentNode.SelectNodes("//comment()[contains(.,'table-filters')]/following::*[not(preceding::comment()[contains(.,'main-wrapper')])]");
    var commentHtml = commentNode.Select(c1 => c1.SelectSingleNode("//table"));

    DataTable dt = new DataTable();
    dt.Columns.Add("Rk", typeof(string));
    dt.Columns.Add("Team", typeof(string));
    dt.Columns.Add("2018", typeof(string));
    dt.Columns.Add("Last3", typeof(string));
    dt.Columns.Add("Last1", typeof(string));
    dt.Columns.Add("Home", typeof(string));
    dt.Columns.Add("Away", typeof(string));
    dt.Columns.Add("2017", typeof(string));

    foreach (var table in commentHtml)
    {
        foreach (var row in table.SelectNodes("//tr"))
        {
            var dr = dt.NewRow();
            dt.Rows.Add(dr);

            int i = 0;
            foreach (var cell in row.SelectNodes("//td"))
            {
                dr[i++] = cell.InnerText;
            }
        }

        gvTeamStats.DataSource = dt;
    }
}

该异常显示为“ System.IndexOutOfRangeException:'找不到列8。'”,并被此行代码抛出

                    dr[i++] = cell.InnerText;

1 个答案:

答案 0 :(得分:0)

我做了一些更改:

页面查看源中的

表结构为:

<table>
<thead>
 <tr>
    <th>Rank</th>
    <th>Team</th>
    <th>2018</th>
    <th>Last 3</th>
    <th>Last 1</th>
    <th>Home</th>
    <th>Away</th>
    <th>2017</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
</tr>
</table>

var webGet = new HtmlWeb();
var getPage = webGet.Load("https://www.teamrankings.com/nba/stat/effective-field-goal-pct");
var tableHeader = getPage.DocumentNode.SelectNodes("//table/thead/tr");
var tableData = getPage.DocumentNode.SelectNodes("//table/tbody/tr");

DataTable dataTable = new DataTable();

var headers = tableHeader
            .Elements("th")
            .Select(th => th.InnerText.Trim());

foreach (var header in headers)
{
    dataTable.Columns.Add(header);
}

var rows = tableData.Select(tr => tr
            .Elements("td")
            .Select(td => td.InnerText.Trim())
            .ToArray());

foreach (var row in rows)
{
    dataTable.Rows.Add(row);
}

// print our datatable
foreach (DataRow dataRow in dataTable.Rows)
{
     foreach (var item in dataRow.ItemArray)
     {
         Console.Write(item + " ");
     }
     Console.WriteLine();
 }