我正在使用一些html表并试图用htmlagilitypack挖掘它们。源html可在此处找到:https://www.ultimate-guitar.com/search.php?title=breaking+benjamin+polyamorous&type%5B1%5D=200&rating%5B0%5D=4&rating%5B1%5D=5 样本表:
<table cellspacing="1" class="tresults">
<tbody>
<tr>
<th width="175">Artist :</th>
<th>Song :</th>
<th width="115">Rating :</th>
<th width="80">Type :</th>
</tr>
<tr>
<td>
<a href="/tabs/breaking_benjamin_tabs.htm" class="song search_art">
<b>Breaking</b> <b>Benjamin</b>
</a>
</td>
<td>
<a target="_blank" href="http://plus.ultimate-guitar.com/tp/?artist=Breaking+Benjamin&song=Polyamorous" class="song js-tp_link"><b>Polyamorous</b></a>
<a target="_blank" class="js-tp_link" href="http://plus.ultimate-guitar.com/tp/?artist=Breaking+Benjamin&song=Polyamorous"><b
class="play_tab_list"title="Playback"></b></a>
</td>
<td class="gray4"></td>
<td><strong>tab pro</strong>
</td>
</tr>
<tr class="stripe">
<td> </td>
<td>
<a href="https://tabs.ultimate-guitar.com/b/breaking_benjamin/polyamorous_ver2_tab.htm" class="song result-link"><b>Polyamorous</b> (ver 2)</a>
</td>
<td class="gray4"><span class="rating"><span class="r_4"></span></span> <span>[ <b class="ratdig">5</b> ]</span>
</td>
<td><strong>tab</strong>
</td>
</tr>
<tr>
<td> </td>
<td>
<a href="https://tabs.ultimate-guitar.com/b/breaking_benjamin/polyamorous_ver4_tab.htm" class="song result-link"><b>Polyamorous</b> (ver 4)</a>
</td>
<td class="gray4"><span class="rating"><span class="r_4"></span></span> <span>[ <b class="ratdig">30</b> ]</span>
</td>
<td><strong>tab</strong>
</td>
</tr>
<tr class="stripe">
<td> </td>
<td>
<a href="https://tabs.ultimate-guitar.com/b/breaking_benjamin/polyamorous_ver5_tab.htm" class="song result-link"><b>Polyamorous</b> (ver 5)</a>
</td>
<td class="gray4"><span class="rating"><span class="r_4"></span></span> <span>[ <b class="ratdig">12</b> ]</span>
</td>
<td><strong>tab</strong>
</td>
</tr>
<tr>
<td> </td>
<td>
<a href="https://tabs.ultimate-guitar.com/b/breaking_benjamin/polyamorous_ver6_tab.htm" class="song result-link"><b>Polyamorous</b> (ver 6)</a>
<span rel="#info_333408" class="tabinfo">info</span>
<div class="dn" id="info_333408">
<font style="font-family:trebuchet ms;font-size:12px;font-weight:bold;line-height:120%"><b><font color="#DDDDCC">+</font> Difficulty:</b> <font color="#DDDDCC">novice</font>
<br>
</font>
</div>
</td>
<td class="gray4"><span class="rating"><span class="r_4"></span></span> <span>[ <b class="ratdig">20</b> ]</span>
</td>
<td><strong>tab</strong>
</td>
</tr>
<tr class="stripe">
<td> </td>
<td>
<a href="https://tabs.ultimate-guitar.com/b/breaking_benjamin/polyamorous_ver7_tab.htm" class="song result-link"><b>Polyamorous</b> (ver 7)</a>
</td>
<td class="gray4"><span class="rating"><span class="r_4"></span></span> <span>[ <b class="ratdig">5</b> ]</span>
</td>
<td><strong>tab</strong>
</td>
</tr>
<tr>
<td> </td>
<td>
<a href="https://tabs.ultimate-guitar.com/b/breaking_benjamin/polyamorous_ver8_tab_952279id_24052010date.htm" class="song result-link"><b>Polyamorous</b> (ver 8)</a>
<span rel="#info_952279" class="tabinfo">info</span>
<div class="dn" id="info_952279">
<font style="font-family:trebuchet ms;font-size:12px;font-weight:bold;line-height:120%"><b><font color="#DDDDCC">+</font> Difficulty:</b> <font color="#DDDDCC">novice</font>
<br>
</font>
<p style="margin-top:3px"><font style="font-family:trebuchet ms;font-size:12px;font-weight:bold;line-height:120%"><b><font color="#DDDDCC">+</font> Tuning:</b> <font color="#DDDDCC">Drop C#</font></font>
</p>
</div>
</td>
<td class="gray4"><span class="rating"><span class="r_5"></span></span> <span>[ <b class="ratdig">6</b> ]</span>
</td>
<td><strong>tab</strong>
</td>
</tr>
<tr class="stripe">
<td> </td>
<td>
<a href="https://tabs.ultimate-guitar.com/b/breaking_benjamin/polyamorous_acoustic_tab.htm" class="song result-link"><b>Polyamorous</b> Acoustic</a>
<span rel="#info_258880" class="tabinfo">info</span>
<div class="dn" id="info_258880">
<font style="font-family:trebuchet ms;font-size:12px;font-weight:bold;line-height:120%"><b><font color="#DDDDCC">+</font> Difficulty:</b> <font color="#DDDDCC">novice</font>
<br>
</font>
</div>
</td>
<td class="gray4"><span class="rating"><span class="r_5"></span></span> <span>[ <b class="ratdig">9</b> ]</span>
</td>
<td><strong>tab</strong>
</td>
</tr>
</tbody>
</table>
为了从完整的html文档中获取此表,以下是我的C#代码片段:
string source_code = web.DownloadString("https://www.ultimate-guitar.com/search.php?title="+ songArtist + songTitle + "&type%5B1%5D=200&rating%5B0%5D=4&rating%5B1%5D=5");
doc.LoadHtml(source_code);
HtmlNodeCollection resultsTable = doc.DocumentNode.SelectSingleNode("//table[@class='tresults']");
foreach(var cell in resultsTable.Descendants())
{
Console.WriteLine(cell.InnerHtml);
}
我希望返回表格的全部内容,除非它停在该行:<b class="play_tab_list" title="Playback"></b>
我的最终目标是返回表格中的所有链接,但我甚至无法看到完整的表格。
答案 0 :(得分:0)
此代码将打印表格中所有链接的网址。
var doc = new HtmlDocument();
var web = new WebClient();
string source_code = web.DownloadString("https://www.ultimate-guitar.com/search.php?title=breaking+benjamin+polyamorous&type[1]=200&rating[0]=4&rating[1]=5");
doc.LoadHtml(source_code);
HtmlNodeCollection links = doc.DocumentNode.SelectNodes("//a[contains(@class,'link')]");
foreach (var link in links)
{
Console.WriteLine("{0} {1}", link.InnerText, link.Attributes["href"].Value);
}