HTMLAgilityPack和麻烦返回全表

时间:2016-03-25 17:23:54

标签: c# html-agility-pack

我正在使用一些html表并试图用htmlagilitypack挖掘它们。源html可在此处找到:https://www.ultimate-guitar.com/search.php?title=breaking+benjamin+polyamorous&type%5B1%5D=200&rating%5B0%5D=4&rating%5B1%5D=5 样本表:

<table cellspacing="1" class="tresults">
  <tbody>
    <tr>
      <th width="175">Artist :</th>
      <th>Song :</th>
      <th width="115">Rating :</th>
      <th width="80">Type :</th>
    </tr>
    <tr>
      <td>
        <a href="/tabs/breaking_benjamin_tabs.htm" class="song search_art">
          <b>Breaking</b>  <b>Benjamin</b> 
        </a>
      </td>
      <td>
        <a target="_blank" href="http://plus.ultimate-guitar.com/tp/?artist=Breaking+Benjamin&amp;song=Polyamorous" class="song js-tp_link"><b>Polyamorous</b></a>
        <a target="_blank" class="js-tp_link" href="http://plus.ultimate-guitar.com/tp/?artist=Breaking+Benjamin&amp;song=Polyamorous"><b 
class="play_tab_list"title="Playback"></b></a>
      </td>



      <td class="gray4"></td>
      <td><strong>tab pro</strong>
      </td>
    </tr>
    <tr class="stripe">
      <td>&nbsp;</td>
      <td>
        <a href="https://tabs.ultimate-guitar.com/b/breaking_benjamin/polyamorous_ver2_tab.htm" class="song result-link"><b>Polyamorous</b> (ver 2)</a>
      </td>
      <td class="gray4"><span class="rating"><span class="r_4"></span></span> <span>[ <b class="ratdig">5</b> ]</span>
      </td>
      <td><strong>tab</strong>
      </td>
    </tr>
    <tr>
      <td>&nbsp;</td>
      <td>
        <a href="https://tabs.ultimate-guitar.com/b/breaking_benjamin/polyamorous_ver4_tab.htm" class="song result-link"><b>Polyamorous</b> (ver 4)</a>
      </td>
      <td class="gray4"><span class="rating"><span class="r_4"></span></span> <span>[ <b class="ratdig">30</b> ]</span>
      </td>
      <td><strong>tab</strong>
      </td>
    </tr>
    <tr class="stripe">
      <td>&nbsp;</td>
      <td>
        <a href="https://tabs.ultimate-guitar.com/b/breaking_benjamin/polyamorous_ver5_tab.htm" class="song result-link"><b>Polyamorous</b> (ver 5)</a>
      </td>
      <td class="gray4"><span class="rating"><span class="r_4"></span></span> <span>[ <b class="ratdig">12</b> ]</span>
      </td>
      <td><strong>tab</strong>
      </td>
    </tr>
    <tr>
      <td>&nbsp;</td>
      <td>
        <a href="https://tabs.ultimate-guitar.com/b/breaking_benjamin/polyamorous_ver6_tab.htm" class="song result-link"><b>Polyamorous</b> (ver 6)</a>
        &nbsp;
        <span rel="#info_333408" class="tabinfo">info</span>
        <div class="dn" id="info_333408">
          <font style="font-family:trebuchet ms;font-size:12px;font-weight:bold;line-height:120%"><b><font color="#DDDDCC">+</font> Difficulty:</b> <font color="#DDDDCC">novice</font>
          <br>
          </font>
        </div>
      </td>
      <td class="gray4"><span class="rating"><span class="r_4"></span></span> <span>[ <b class="ratdig">20</b> ]</span>
      </td>
      <td><strong>tab</strong>
      </td>
    </tr>
    <tr class="stripe">
      <td>&nbsp;</td>
      <td>
        <a href="https://tabs.ultimate-guitar.com/b/breaking_benjamin/polyamorous_ver7_tab.htm" class="song result-link"><b>Polyamorous</b> (ver 7)</a>
      </td>
      <td class="gray4"><span class="rating"><span class="r_4"></span></span> <span>[ <b class="ratdig">5</b> ]</span>
      </td>
      <td><strong>tab</strong>
      </td>
    </tr>
    <tr>
      <td>&nbsp;</td>
      <td>
        <a href="https://tabs.ultimate-guitar.com/b/breaking_benjamin/polyamorous_ver8_tab_952279id_24052010date.htm" class="song result-link"><b>Polyamorous</b> (ver 8)</a>
        &nbsp;
        <span rel="#info_952279" class="tabinfo">info</span>
        <div class="dn" id="info_952279">
          <font style="font-family:trebuchet ms;font-size:12px;font-weight:bold;line-height:120%"><b><font color="#DDDDCC">+</font> Difficulty:</b> <font color="#DDDDCC">novice</font>
          <br>
          </font>
          <p style="margin-top:3px"><font style="font-family:trebuchet ms;font-size:12px;font-weight:bold;line-height:120%"><b><font color="#DDDDCC">+</font> Tuning:</b> <font color="#DDDDCC">Drop C#</font></font>
          </p>
        </div>
      </td>
      <td class="gray4"><span class="rating"><span class="r_5"></span></span> <span>[ <b class="ratdig">6</b> ]</span>
      </td>
      <td><strong>tab</strong>
      </td>
    </tr>
    <tr class="stripe">
      <td>&nbsp;</td>
      <td>
        <a href="https://tabs.ultimate-guitar.com/b/breaking_benjamin/polyamorous_acoustic_tab.htm" class="song result-link"><b>Polyamorous</b>&nbsp;Acoustic</a>
        &nbsp;
        <span rel="#info_258880" class="tabinfo">info</span>
        <div class="dn" id="info_258880">
          <font style="font-family:trebuchet ms;font-size:12px;font-weight:bold;line-height:120%"><b><font color="#DDDDCC">+</font> Difficulty:</b> <font color="#DDDDCC">novice</font>
          <br>
          </font>
        </div>
      </td>
      <td class="gray4"><span class="rating"><span class="r_5"></span></span> <span>[ <b class="ratdig">9</b> ]</span>
      </td>
      <td><strong>tab</strong>
      </td>
    </tr>
  </tbody>
</table>

为了从完整的html文档中获取此表,以下是我的C#代码片段:

string source_code = web.DownloadString("https://www.ultimate-guitar.com/search.php?title="+ songArtist + songTitle + "&type%5B1%5D=200&rating%5B0%5D=4&rating%5B1%5D=5");
doc.LoadHtml(source_code);    
HtmlNodeCollection resultsTable = doc.DocumentNode.SelectSingleNode("//table[@class='tresults']");
            foreach(var cell in resultsTable.Descendants())
            {
                Console.WriteLine(cell.InnerHtml);
            }

我希望返回表格的全部内容,除非它停在该行:<b class="play_tab_list" title="Playback"></b>

我的最终目标是返回表格中的所有链接,但我甚至无法看到完整的表格。

1 个答案:

答案 0 :(得分:0)

此代码将打印表格中所有链接的网址。

        var doc = new HtmlDocument();
        var web = new WebClient();
        string source_code = web.DownloadString("https://www.ultimate-guitar.com/search.php?title=breaking+benjamin+polyamorous&type[1]=200&rating[0]=4&rating[1]=5");
        doc.LoadHtml(source_code);
        HtmlNodeCollection links = doc.DocumentNode.SelectNodes("//a[contains(@class,'link')]");
        foreach (var link in links)
        {
            Console.WriteLine("{0} {1}", link.InnerText, link.Attributes["href"].Value);
        }