我目前正在业余时间从一个曲棍球统计数据库中读取一个项目,并且几天来一直遇到麻烦只能得到我想要的数据。我使用HTML Agility Pack解析和LINQ方法来精确定位正确的表。当我读取标题标记时它指向右表,但是当我尝试从特定表中读取每一行数据时,它会一直回到html文档的开头并从第一个表开始。感觉就像我的第二个循环忽略了我想要的桌子的位置。到目前为止,这是我的代码:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Web;
using System.Data;
using System.Net;
using HtmlAgilityPack;
namespace HTMLDataGatherer
{
class Program
{
static void Main(string[] args)
{
string htmlCode = "";
//Simple html doc initialization
using (WebClient client = new WebClient())
{
client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
htmlCode =
client.DownloadString(
"http://stats.hockeyanalysis.com/ratings.php?db=201415&sit=all&type=individual&teamid=0&pos=skaters&minutes=50&disp=1&sort=PCT&sortdir=DESC.html");
}
ReadMe(htmlCode);
}
static void ReadMe(String sourceHtml)
{
HtmlDocument reader = new HtmlDocument();
reader.LoadHtml(sourceHtml);
//'Table 3' as in the table I want is the third one from the top
var table3 = reader.DocumentNode.Descendants("table").Skip(2).FirstOrDefault();
//Points to the header tags from the first row
var headers = table3.SelectNodes("//tr//th");
DataTable table = new DataTable();
//Adds the header data to 'table' and outputs to console for confirmation
foreach (HtmlNode header in headers)
{
Console.Write(header.InnerText.ToString() + "\t");
table.Columns.Add(header.InnerText); // create columns from th
}
//***************************
//This starts from first table for some reason
//I need this foreach to start from the third table, but
//after it reads the headers it will output rows from the table
//at the beginning.
// select rows with td elements
foreach (var row in reader.DocumentNode.Descendants("table").Skip(2).FirstOrDefault().SelectNodes("//tr[td]"))
{
//I am unsure how to access each row of data to write
Console.Write(row.InnerText.ToString() + "inner\t");
table.Rows.Add(row.SelectNodes("td").Select(td => td.InnerText).ToArray());
}
//***************************
Console.Read();
}
}
}
这是html。我试图通过所有垃圾和下拉菜单的东西,然后直接进入播放器数据。
<html><head><title>Stats.HockeyAnalysis.com NHL Individual Player Statistics 201415</title><link rel="stylesheet" type="text/css" href="style.css" /><meta name="author" content="HockeyAnalysis.com" /><meta name="description" content="Individual player statistics" /><meta name="keywords" content="hockey, nhl, statistics, advanced stats, fancystats, corsi, fenwick, pdo, analytics" /></head></body><!-- BuySellAds Ad Code -->
<script type="text/javascript">
(function(){
var bsa = document.createElement('script');
bsa.type = 'text/javascript';
bsa.async = true;
bsa.src = 'http://s3.buysellads.com/ac/bsa.js';
(document.getElementsByTagName('head')[0]||document.getElementsByTagName('body')[0]).appendChild(bsa);
})();
</script>
<!-- End BuySellAds Ad Code -->
<div class="header"><br><table background-color=#000000><tr><td width=400><a href="index.php"><h1>Stats.HockeyAnalysis.com</h1></a><br><h3>The most complete database of advanced hockey stats</h3><br></td><td valign=top>
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- stats.hockeyanalysis.com leaderboard -->
<ins class="adsbygoogle"
style="display:inline-block;width:728px;height:90px"
data-ad-client="ca-pub-5068178113874808"
data-ad-slot="9146794890"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</td></tr>
</table>
</div>
<div class="content">
<br>
<center>
<div id="tswcsstabs">
<ul>
<li><a href="index.php">Home</a></li>
<li><a href="teamstats.php">Team Stats</a></li>
<li><a href="ratings.php">Player Stats</a></li>
<li><a href="players.php">Players</a></li>
<!-- <li><a href="2013playoffs/index.php">Playoff Data</a></li> -->
<li><a href="glossary.php">Glossary</a></li>
<!-- <li><a href="about.php">About Ratings</a></li> -->
<li><a href="faq.php">FAQ</a></li>
<li><a href="http://www.hockeyanalysis.com">Blog</a></li>
<!-- <li><a href="services.php">Services</a></li> -->
<!-- <li><a href="advertise.php">Advertise</a></li> -->
<li><a href="donate.php">Donate</a></li>
</ul>
</div>
<div>
<br>
For a new modern interface to these <a href="http://www.puckalytics.com">Advanced NHL Statistics</a> visit <a href="http://www.puckalytics.com">Puckalytics.com</a>.
</div>
</center>
</div>
<br><br>
<form action="ratings.php" method="get">
<center>
<h2>NHL Player Stats/Ratings</h2>
<table>
<tr>
<td width=250 align="right">
Season:
<select name="db" style="width: 180px">
<option value="201415" selected >2014-15</option>
<option value="201314" >2013-14</option>
<option value="201213" >2012-13</option>
<option value="201112" >2011-12</option>
<option value="201011" >2010-11</option>
<option value="200910" >2009-10</option>
<option value="200809" >2008-09</option>
<option value="200708" >2007-08</option>
<option value="201315" >2013-15 (2yr)</option>
<option value="201214" >2012-14 (2yr)</option>
<option value="201113" >2011-13 (2yr)</option>
<option value="201012" >2010-12 (2yr)</option>
<option value="200911" >2009-11 (2yr)</option>
<option value="200810" >2008-10 (2yr)</option>
<option value="200709" >2007-09 (2yr)</option>
<option value="201215" >2012-15 (3yr)</option>
<option value="201114" >2011-14 (3yr)</option>
<option value="201013" >2010-13 (3yr)</option>
<option value="200912" >2009-12 (3yr)</option>
<option value="200811" >2008-11 (3yr)</option>
<option value="200710" >2007-10 (3yr)</option>
<option value="201115" >2011-15 (4yr)</option>
<option value="201014" >2010-14 (4yr)</option>
<option value="200913" >2009-13 (4yr)</option>
<option value="200812" >2008-12 (4yr)</option>
<option value="200711" >2007-11 (4yr)</option>
<option value="200914" >2009-14 (5yr)</option>
<option value="200813" >2008-13 (5yr)</option>
<option value="200712" >2007-12 (5yr)</option>
<option value="200814" >2008-14 (6yr)</option>
<option value="200713" >2007-13 (6yr)</option>
<option value="200714" >2007-14 (7yr)</option>
</select>
<br>
Situation:
<select name="sit" style="width: 180px">
<option value="5v5" >5 on 5</option>
<option value="5v5home" >5 on 5 Home</option>
<option value="5v5road" >5 on 5 Road</option>
<option value="5v5close" >5 on 5, Close</option>
<option value="5v5close_home" >5 on 5, Close Home</option>
<option value="5v5close_road" >5 on 5, Close Road</option>
<option value="5v5tied" >5 on 5, Tied</option>
<option value="5v5tied_home" >5 on 5, Tied Home</option>
<option value="5v5tied_road" >5 on 5, Tied Road</option>
<option value="5v5leading" >5 on 5, Leading</option>
<option value="5v5leading_home" >5 on 5, Leading Home</option>
<option value="5v5leading_road" >5 on 5, Leading Road</option>
<option value="5v5trailing" >5 on 5, Trailing</option>
<option value="5v5trailing_home" >5 on 5, Trailing Home</option>
<option value="5v5trailing_road" >5 on 5, Trailing Road</option>
<option value="5v5up1" >5 on 5, Up 1</option>
<option value="5v5up2" >5 on 5, Up 2+</option>
<option value="5v5down1" >5 on 5, Down1</option>
<option value="5v5down2" >5 on 5, Down2+</option>
<option value="4v4" >4 on 4</option>
<option value="all" selected >All Situations</option>
<option value="5v4" >5 on 4 PP</option>
<option value="4v5" >4 on 5 SH</option>
<option value="PP" >All PP</option>
<option value="SH" >All SH</option>
<option value="f10" >5 on 5 (ZS Adj.)
<option value="5v5home_f10" >5 on 5 Home (ZS Adj.)</option>
<option value="5v5road_f10" >5 on 5 Road (ZS Adj.)</option>
<option value="5v5close_f10" >5 on 5, Close (ZS Adj.)</option>
<option value="5v5tied_f10" >5 on 5, Tied (ZS Adj.)</option>
<option value="5v5up1_f10" >5 on 5, Up 1 (ZS Adj.)</option>
<option value="5v5up2_f10" >5 on 5, Up 2+ (ZS Adj.)</option>
<option value="5v5down1_f10" >5 on 5, Down1 (ZS Adj.)</option>
<option value="5v5down2_f10" >5 on 5, Down2+ (ZS Adj.)</option>
<option value="5v5leading_f10" >5 on 5, Leading (ZS Adj.)</option>
<option value="5v5trailing_f10" >5 on 5, Trailing (ZS Adj.)</option>
</select>
<br>
Report:
<select name="type" style="width: 180px">
<option value="individual" selected >Individual Stats</a>
<option value="goals" >On-ice Goal Stats</a>
<option value="shots" >On-ice Shot Stats</a>
<option value="fenwick" >On-ice Fenwick Stats</a>
<option value="corsi" >On-ice Corsi Stats</a>
</select>
</td>
<td width=250 align="right">
Select Team:
<select name="teamid" style="width: 150px">
<option value="0" selected >All Teams</option>
<option value="1" >Anaheim</option>
<option value="23" >Arizona</option>
<option value="3" >Boston</option>
<option value="4" >Buffalo</option>
<option value="5" >Carolina</option>
<option value="6" >Calgary</option>
<option value="7" >Chicago</option>
<option value="9" >Colorado</option>
<option value="8" >Columbus</option>
<option value="10" >Dallas</option>
<option value="11" >Detroit</option>
<option value="12" >Edmonton</option>
<option value="13" >Florida</option>
<option value="14" >Los Angeles</option>
<option value="15" >Minnesota</option>
<option value="16" >Montreal</option>
<option value="17" >Nashville</option>
<option value="18" >New Jersey</option>
<option value="19" >NY Islanders</option>
<option value="20" >NY Rangers</option>
<option value="21" >Ottawa</option>
<option value="22" >Philadelphia</option>
<option value="24" >Pittsburgh</option>
<option value="25" >San Jose</option>
<option value="26" >St. Louis</option>
<option value="27" >Tampa Bay</option>
<option value="28" >Toronto</option>
<option value="29" >Vancouver</option>
<option value="30" >Washington</option>
<option value="2" >Winnipeg(Atlanta)</option>
</select>
<br>
Select Position:
<select name="pos" style="width: 150px">
<option value="skaters" selected >All Skaters</option>
<option value="forwards" >Forwards</option>
<option value="defense" >Defensemen</option>
<option value="goalies" >Goalies</option>
</select>
<br>
Minutes Played:
<select name="minutes" style="width: 150px">
<option value="50" selected >50</option>
<option value="100" >100</option>
<option value="200" >200</option>
<option value="300" >300</option>
<option value="400" >400</option>
<option value="500" >500</option>
<option value="750" >750</option>
<option value="1000" >1000</option>
<option value="1250" >1250</option>
<option value="1500" >1500</option>
<option value="2000" >2000</option>
<option value="2500" >2500</option>
<option value="3000" >3000</option>
<option value="4000" >4000</option>
<option value="5000" >5000</option>
<option value="6000" >6000</option>
<option value="7500" >7500</option>
<option value="10000" >10000</option>
</select>
</td>
</tr>
</table>
<input type="hidden" name="disp" value="1">
<input type="hidden" name="sort" value="PCT">
<input type="hidden" name="sortdir" value="DESC">
<input type="submit" value="Update Player Stats" />
<br><br>
</center>
</form>
<table border=1 bgcolor=#aaaaaa>
<tr>
<th align=left>#</th>
<th align=left class="td_name"><a href="ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=name&sortdir=ASC">Player Name</a> </th>
<th align=left width=75>Team</th>
<th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=GP&sortdir=DESC >GP</a></th>
<th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=toi&sortdir=DESC >TOI</a></th>
<th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=igoals&sortdir=DESC >G</a></th>
<th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=iassists&sortdir=DESC >A</a></th>
<th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=ifassists&sortdir=DESC >FirstA</a></th>
<th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=ipoints&sortdir=DESC >Points</a></th>
<th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=ishots&sortdir=DESC >Shots</a></th>
<th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=ifenwick&sortdir=DESC >iFenwick</a></th>
<th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=icorsi&sortdir=DESC >iCorsi</a></th>
<th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=ishpct&sortdir=DESC >ShPct</a></th>
<th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=igoals60&sortdir=DESC >G/60</a></th>
<th width=45><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=iassists60&sortdir=DESC >A/60</a></th>
<th width=60><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=ifassists60&sortdir=DESC >FirstA/60</a></th>
<th width=60><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=ipoints60&sortdir=ASC >Points/60</a></th>
<th width=60><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=ishots60&sortdir=DESC >Shots/60</a></th>
<th width=60><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=ifenwick60&sortdir=DESC >iFenwick/60</a></th>
<th width=60><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=icorsi60&sortdir=DESC >iCorsi/60</a></th>
<th width=55><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=IGP&sortdir=DESC >IGP</a></th>
<th width=55><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=IAP&sortdir=DESC >IAP</a></th>
<th width=55><a href=ratings.php?disp=1&db=201415&sit=all&pos=skaters&minutes=50&teamid=0&type=individual&sort=IPP&sortdir=DESC >IPP</a></th>
</tr>
<tr bgcolor=#eedddd> <td> 1</td>
<td><a href=showplayer.php?pid=988>VORACEK, JAKUB</td>
<td>Philadelphia</td>
<td><center> 51</center></td>
<td><center> 939:45</center></td>
<td><center> 17</center></td>
<td><center> 41</center></td>
<td><center> 20</center></td>
<td><center> 58</center></td>
<td><center> 143</center></td>
<td><center> 190</center></td>
<td><center> 265</center></td>
<td><center> 11.89</center></td>
<td><center> 1.09</center></td>
<td><center> 2.62</center></td>
<td><center> 1.28</center></td>
<td><strong><center> 3.70</center></strong></td>
<td><center> 9.13</center></td>
<td><center> 12.13</center></td>
<td><center> 16.92</center></td>
<td><center> 22.7</center></td>
<td><center> 54.7</center></td>
<td><center> 77.3</center></td>
</tr>
<tr bgcolor=#eedddd> <td> 2</td>
<td><a href=showplayer.php?pid=458>MALKIN, EVGENI</td>
<td>Pittsburgh</td>
<td><center> 46</center></td>
<td><center> 894:43</center></td>
<td><center> 20</center></td>
<td><center> 32</center></td>
<td><center> 17</center></td>
<td><center> 52</center></td>
<td><center> 144</center></td>
<td><center> 204</center></td>
<td><center> 256</center></td>
<td><center> 13.89</center></td>
<td><center> 1.34</center></td>
<td><center> 2.15</center></td>
<td><center> 1.14</center></td>
<td><strong><center> 3.49</center></strong></td>
<td><center> 9.66</center></td>
<td><center> 13.68</center></td>
<td><center> 17.17</center></td>
<td><center> 29.0</center></td>
<td><center> 46.4</center></td>
<td><center> 75.4</center></td>
</tr>
<tr bgcolor=#ddddee> <td> 3</td>
<td><a href=showplayer.php?pid=1675>TARASENKO, VLADIMIR</td>
<td>St. Louis</td>
<td><center> 50</center></td>
<td><center> 894:27</center></td>
<td><center> 26</center></td>
<td><center> 25</center></td>
<td><center> 14</center></td>
<td><center> 51</center></td>
<td><center> 175</center></td>
<td><center> 254</center></td>
<td><center> 340</center></td>
<td><center> 14.86</center></td>
<td><center> 1.74</center></td>
<td><center> 1.68</center></td>
<td><center> 0.94</center></td>
<td><strong><center> 3.42</center></strong></td>
<td><center> 11.74</center></td>
<td><center> 17.04</center></td>
<td><center> 22.81</center></td>
<td><center> 34.7</center></td>
<td><center> 33.3</center></td>
<td><center> 68.0</center></td>
</tr>
etc.
我只想要恰好是第三张桌子的玩家数据。如果运行此代码,它将正确显示标题行(格式错误),但随后显示数据行,就像它跳回到第一个表一样。任何反馈都会受到很大的限制!