如何在C#中读取包含多行的特定HTML表?

时间:2015-02-05 14:42:11

标签: c# html linq

我目前正在业余时间从一个曲棍球统计数据库中读取一个项目,并且几天来一直遇到麻烦只能得到我想要的数据。我使用HTML Agility Pack解析和LINQ方法来精确定位正确的表。当我读取标题标记时它指向右表,但是当我尝试从特定表中读取每一行数据时,它会一直回到html文档的开头并从第一个表开始。感觉就像我的第二个循环忽略了我想要的桌子的位置。到目前为止,这是我的代码:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Web;
using System.Data;
using System.Net;
using HtmlAgilityPack;

namespace HTMLDataGatherer
{
    class Program
{

    static void Main(string[] args)
    {
        string htmlCode = "";

        //Simple html doc initialization
        using (WebClient client = new WebClient())
        {
            client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
            htmlCode = 
            client.DownloadString(
            "http://stats.hockeyanalysis.com/ratings.php?db=201415&sit=all&type=individual&teamid=0&pos=skaters&minutes=50&disp=1&sort=PCT&sortdir=DESC.html");
        }

        ReadMe(htmlCode);
    }

    static void ReadMe(String sourceHtml)
    {
        HtmlDocument reader = new HtmlDocument();

        reader.LoadHtml(sourceHtml);

        //'Table 3' as in the table I want is the third one from the top
        var table3 = reader.DocumentNode.Descendants("table").Skip(2).FirstOrDefault();

        //Points to the header tags from the first row
        var headers = table3.SelectNodes("//tr//th");

        DataTable table = new DataTable();

        //Adds the header data to 'table' and outputs to console for confirmation
        foreach (HtmlNode header in headers)
        {
            Console.Write(header.InnerText.ToString() + "\t");
            table.Columns.Add(header.InnerText); // create columns from th
        }

        //***************************
        //This starts from first table for some reason
        //I need this foreach to start from the third table, but
        //after it reads the headers it will output rows from the table
        //at the beginning.

        // select rows with td elements
        foreach (var row in reader.DocumentNode.Descendants("table").Skip(2).FirstOrDefault().SelectNodes("//tr[td]"))
        {
            //I am unsure how to access each row of data to write
            Console.Write(row.InnerText.ToString() + "inner\t");
            table.Rows.Add(row.SelectNodes("td").Select(td => td.InnerText).ToArray());
        }
        //***************************
        Console.Read();

    }
}
}

这是html。我试图通过所有垃圾和下拉菜单的东西,然后直接进入播放器数据。

<html><head><title>Stats.HockeyAnalysis.com NHL Individual Player Statistics 201415</title><link rel="stylesheet" type="text/css" href="style.css" /><meta name="author" content="HockeyAnalysis.com" /><meta name="description" content="Individual player statistics" /><meta name="keywords" content="hockey, nhl, statistics, advanced stats, fancystats, corsi, fenwick, pdo, analytics" /></head></body><!-- BuySellAds Ad Code -->
<script type="text/javascript">
(function(){
  var bsa = document.createElement('script');
     bsa.type = 'text/javascript';
     bsa.async = true;
     bsa.src = 'http://s3.buysellads.com/ac/bsa.js';
  (document.getElementsByTagName('head')[0]||document.getElementsByTagName('body')[0]).appendChild(bsa);
})();
</script>
<!-- End BuySellAds Ad Code -->

<div class="header"><br><table background-color=#000000><tr><td width=400><a href="index.php"><h1>Stats.HockeyAnalysis.com</h1></a><br><h3>The most complete database of advanced hockey stats</h3><br></td><td valign=top>
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- stats.hockeyanalysis.com leaderboard -->
<ins class="adsbygoogle"
     style="display:inline-block;width:728px;height:90px"
     data-ad-client="ca-pub-5068178113874808"
     data-ad-slot="9146794890"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>



</td></tr>
</table>
</div>
<div class="content">
<br>
<center>
<div id="tswcsstabs">
<ul>
<li><a href="index.php">Home</a></li>
<li><a href="teamstats.php">Team Stats</a></li>
<li><a href="ratings.php">Player Stats</a></li>
<li><a href="players.php">Players</a></li>
<!-- <li><a href="2013playoffs/index.php">Playoff Data</a></li> -->
<li><a href="glossary.php">Glossary</a></li>
<!-- <li><a href="about.php">About Ratings</a></li> -->
<li><a href="faq.php">FAQ</a></li>
<li><a href="http://www.hockeyanalysis.com">Blog</a></li>
<!-- <li><a href="services.php">Services</a></li> -->
<!-- <li><a href="advertise.php">Advertise</a></li> -->
<li><a href="donate.php">Donate</a></li>
</ul>
</div>
<div>
<br>
For a new modern interface to these <a href="http://www.puckalytics.com">Advanced NHL Statistics</a> visit <a href="http://www.puckalytics.com">Puckalytics.com</a>.
</div>
</center>
</div>
<br><br>



<form action="ratings.php" method="get">
<center>
<h2>NHL Player Stats/Ratings</h2>
<table>
<tr>
<td width=250 align="right">
     Season:
    <select name="db" style="width: 180px">
    <option value="201415" selected >2014-15</option>
    <option value="201314"  >2013-14</option>
    <option value="201213"  >2012-13</option>
    <option value="201112"  >2011-12</option>
    <option value="201011"  >2010-11</option>
    <option value="200910"  >2009-10</option>
    <option value="200809"  >2008-09</option>
    <option value="200708"  >2007-08</option>
    <option value="201315"  >2013-15 (2yr)</option>
    <option value="201214"  >2012-14 (2yr)</option>
    <option value="201113"  >2011-13 (2yr)</option>
    <option value="201012"  >2010-12 (2yr)</option>
    <option value="200911"  >2009-11 (2yr)</option>
    <option value="200810"  >2008-10 (2yr)</option>
    <option value="200709"  >2007-09 (2yr)</option>
    <option value="201215"  >2012-15 (3yr)</option>
    <option value="201114"  >2011-14 (3yr)</option>
    <option value="201013"  >2010-13 (3yr)</option>
    <option value="200912"  >2009-12 (3yr)</option>
    <option value="200811"  >2008-11 (3yr)</option>
    <option value="200710"  >2007-10 (3yr)</option>
    <option value="201115"  >2011-15 (4yr)</option>
    <option value="201014"  >2010-14 (4yr)</option>
    <option value="200913"  >2009-13 (4yr)</option>
    <option value="200812"  >2008-12 (4yr)</option>
    <option value="200711"  >2007-11 (4yr)</option>
    <option value="200914"  >2009-14 (5yr)</option>
    <option value="200813"  >2008-13 (5yr)</option>
    <option value="200712"  >2007-12 (5yr)</option>
    <option value="200814"  >2008-14 (6yr)</option>
    <option value="200713"  >2007-13 (6yr)</option>
    <option value="200714"  >2007-14 (7yr)</option>
    </select>
<br>
     Situation:
    <select name="sit" style="width: 180px">
    <option value="5v5"  >5 on 5</option>
    <option value="5v5home"  >5 on 5 Home</option>
    <option value="5v5road"  >5 on 5 Road</option>
    <option value="5v5close"  >5 on 5, Close</option>
    <option value="5v5close_home"  >5 on 5, Close Home</option>
    <option value="5v5close_road"  >5 on 5, Close Road</option>
    <option value="5v5tied"  >5 on 5, Tied</option>
    <option value="5v5tied_home"  >5 on 5, Tied Home</option>
    <option value="5v5tied_road"  >5 on 5, Tied Road</option>
    <option value="5v5leading"  >5 on 5, Leading</option>
    <option value="5v5leading_home"  >5 on 5, Leading Home</option>
    <option value="5v5leading_road"  >5 on 5, Leading Road</option>
    <option value="5v5trailing"  >5 on 5, Trailing</option>
    <option value="5v5trailing_home"  >5 on 5, Trailing Home</option>
    <option value="5v5trailing_road"  >5 on 5, Trailing Road</option>
    <option value="5v5up1"  >5 on 5, Up 1</option>
    <option value="5v5up2"  >5 on 5, Up 2+</option>
    <option value="5v5down1"  >5 on 5, Down1</option>
    <option value="5v5down2"  >5 on 5, Down2+</option>
    <option value="4v4"  >4 on 4</option>
    <option value="all" selected >All Situations</option>

    <option value="5v4"  >5 on 4 PP</option>
    <option value="4v5"  >4 on 5 SH</option>
    <option value="PP"  >All PP</option>
    <option value="SH"  >All SH</option>

    <option value="f10"  >5 on 5 (ZS Adj.)
    <option value="5v5home_f10"  >5 on 5 Home (ZS Adj.)</option>
    <option value="5v5road_f10"  >5 on 5 Road (ZS Adj.)</option>
    <option value="5v5close_f10"  >5 on 5, Close (ZS Adj.)</option>
    <option value="5v5tied_f10"  >5 on 5, Tied (ZS Adj.)</option>
    <option value="5v5up1_f10"  >5 on 5, Up 1 (ZS Adj.)</option>
    <option value="5v5up2_f10"  >5 on 5, Up 2+ (ZS Adj.)</option>
    <option value="5v5down1_f10"  >5 on 5, Down1 (ZS Adj.)</option>
    <option value="5v5down2_f10"  >5 on 5, Down2+ (ZS Adj.)</option>
    <option value="5v5leading_f10"  >5 on 5, Leading (ZS Adj.)</option>
    <option value="5v5trailing_f10"  >5 on 5, Trailing (ZS Adj.)</option>
    </select>

<br>
    Report:
    <select name="type" style="width: 180px">
    <option value="individual" selected >Individual Stats</a>
    <option value="goals"  >On-ice Goal Stats</a>
    <option value="shots"  >On-ice Shot Stats</a>
    <option value="fenwick"  >On-ice Fenwick Stats</a>
    <option value="corsi"  >On-ice Corsi Stats</a>
    </select>

</td>
<td width=250 align="right">
    Select Team:
    <select name="teamid" style="width: 150px">
        <option value="0" selected >All Teams</option>
        <option value="1"  >Anaheim</option>
        <option value="23"  >Arizona</option>
        <option value="3"  >Boston</option>
        <option value="4"  >Buffalo</option>
        <option value="5"  >Carolina</option>
        <option value="6"  >Calgary</option>
        <option value="7"  >Chicago</option>
        <option value="9"  >Colorado</option>
        <option value="8"  >Columbus</option>
        <option value="10"  >Dallas</option>
        <option value="11"  >Detroit</option>
        <option value="12"  >Edmonton</option>
        <option value="13"  >Florida</option>
        <option value="14"  >Los Angeles</option>
        <option value="15"  >Minnesota</option>
        <option value="16"  >Montreal</option>
        <option value="17"  >Nashville</option>
        <option value="18"  >New Jersey</option>
        <option value="19"  >NY Islanders</option>
        <option value="20"  >NY Rangers</option>
        <option value="21"  >Ottawa</option>
        <option value="22"  >Philadelphia</option>
        <option value="24"  >Pittsburgh</option>
        <option value="25"  >San Jose</option>
        <option value="26"  >St. Louis</option>
        <option value="27"  >Tampa Bay</option>
        <option value="28"  >Toronto</option>
        <option value="29"  >Vancouver</option>
        <option value="30"  >Washington</option>
        <option value="2"  >Winnipeg(Atlanta)</option>
    </select>
<br>
    Select Position:
    <select name="pos" style="width: 150px">
        <option value="skaters" selected >All Skaters</option>
        <option value="forwards"  >Forwards</option>
        <option value="defense"  >Defensemen</option>
        <option value="goalies"  >Goalies</option>
    </select>
<br>
    Minutes Played:
    <select name="minutes" style="width: 150px">
        <option value="50" selected >50</option>
        <option value="100"  >100</option>
        <option value="200"  >200</option>
        <option value="300"  >300</option>
        <option value="400"  >400</option>
        <option value="500"  >500</option>
        <option value="750"  >750</option>
        <option value="1000"  >1000</option>
        <option value="1250"  >1250</option>
        <option value="1500"  >1500</option>
        <option value="2000"  >2000</option>
        <option value="2500"  >2500</option>
        <option value="3000"  >3000</option>
        <option value="4000"  >4000</option>
        <option value="5000"  >5000</option>
        <option value="6000"  >6000</option>
        <option value="7500"  >7500</option>
        <option value="10000"  >10000</option>
    </select>
</td>
</tr>
</table>
<input type="hidden" name="disp" value="1">
<input type="hidden" name="sort" value="PCT">
<input type="hidden" name="sortdir" value="DESC">
<input type="submit" value="Update Player Stats" />
<br><br>
</center>
</form>

<table border=1 bgcolor=#aaaaaa>
    <tr>
    <th align=left>#</th>
    <th align=left class="td_name"><a href="ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=name&sortdir=ASC">Player Name</a> </th>
    <th align=left width=75>Team</th>
    <th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=GP&sortdir=DESC >GP</a></th>
    <th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=toi&sortdir=DESC >TOI</a></th>
    <th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=igoals&sortdir=DESC >G</a></th>
    <th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=iassists&sortdir=DESC >A</a></th>
    <th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=ifassists&sortdir=DESC >FirstA</a></th>
    <th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=ipoints&sortdir=DESC >Points</a></th>
    <th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=ishots&sortdir=DESC >Shots</a></th>
    <th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=ifenwick&sortdir=DESC >iFenwick</a></th>
    <th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=icorsi&sortdir=DESC >iCorsi</a></th>
    <th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=ishpct&sortdir=DESC >ShPct</a></th>
    <th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=igoals60&sortdir=DESC >G/60</a></th>
    <th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=iassists60&sortdir=DESC >A/60</a></th>
    <th width=60><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=ifassists60&sortdir=DESC >FirstA/60</a></th>
    <th width=60><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=ipoints60&sortdir=ASC >Points/60</a></th>
    <th width=60><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=ishots60&sortdir=DESC >Shots/60</a></th>
    <th width=60><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=ifenwick60&sortdir=DESC >iFenwick/60</a></th>
    <th width=60><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=icorsi60&sortdir=DESC >iCorsi/60</a></th>
    <th width=55><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=IGP&sortdir=DESC >IGP</a></th>
    <th width=55><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=IAP&sortdir=DESC >IAP</a></th>
    <th width=55><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=IPP&sortdir=DESC >IPP</a></th>

    </tr>



<tr bgcolor=#eedddd>    <td>   1</td>
    <td><a href=showplayer.php?pid=988>VORACEK, JAKUB</td>
    <td>Philadelphia</td>
    <td><center>      51</center></td>
    <td><center>     939:45</center></td>
    <td><center>    17</center></td>
    <td><center>    41</center></td>
    <td><center>    20</center></td>
    <td><center>    58</center></td>
    <td><center>   143</center></td>
    <td><center>   190</center></td>
    <td><center>   265</center></td>
    <td><center>   11.89</center></td>
    <td><center>    1.09</center></td>
    <td><center>    2.62</center></td>
    <td><center>    1.28</center></td>
    <td><strong><center>    3.70</center></strong></td>
    <td><center>    9.13</center></td>
    <td><center>   12.13</center></td>
    <td><center>   16.92</center></td>
    <td><center>    22.7</center></td>
    <td><center>    54.7</center></td>
    <td><center>    77.3</center></td>
    </tr>




<tr bgcolor=#eedddd>    <td>   2</td>
    <td><a href=showplayer.php?pid=458>MALKIN, EVGENI</td>
    <td>Pittsburgh</td>
    <td><center>      46</center></td>
    <td><center>     894:43</center></td>
    <td><center>    20</center></td>
    <td><center>    32</center></td>
    <td><center>    17</center></td>
    <td><center>    52</center></td>
    <td><center>   144</center></td>
    <td><center>   204</center></td>
    <td><center>   256</center></td>
    <td><center>   13.89</center></td>
    <td><center>    1.34</center></td>
    <td><center>    2.15</center></td>
    <td><center>    1.14</center></td>
    <td><strong><center>    3.49</center></strong></td>
    <td><center>    9.66</center></td>
    <td><center>   13.68</center></td>
    <td><center>   17.17</center></td>
    <td><center>    29.0</center></td>
    <td><center>    46.4</center></td>
    <td><center>    75.4</center></td>
    </tr>




<tr bgcolor=#ddddee>    <td>   3</td>
    <td><a href=showplayer.php?pid=1675>TARASENKO, VLADIMIR</td>
    <td>St. Louis</td>
    <td><center>      50</center></td>
    <td><center>     894:27</center></td>
    <td><center>    26</center></td>
    <td><center>    25</center></td>
    <td><center>    14</center></td>
    <td><center>    51</center></td>
    <td><center>   175</center></td>
    <td><center>   254</center></td>
    <td><center>   340</center></td>
    <td><center>   14.86</center></td>
    <td><center>    1.74</center></td>
    <td><center>    1.68</center></td>
    <td><center>    0.94</center></td>
    <td><strong><center>    3.42</center></strong></td>
    <td><center>   11.74</center></td>
    <td><center>   17.04</center></td>
    <td><center>   22.81</center></td>
    <td><center>    34.7</center></td>
    <td><center>    33.3</center></td>
    <td><center>    68.0</center></td>
    </tr>

etc.

我只想要恰好是第三张桌子的玩家数据。如果运行此代码,它将正确显示标题行(格式错误),但随后显示数据行,就像它跳回到第一个表一样。任何反馈都会受到很大的限制!

0 个答案:

没有答案