我可以使用HtmlAgilityPack在特定标签上拆分HTML文档吗?

时间:2011-05-11 20:59:35

标签: c# .net html html-agility-pack

例如,我有一堆我要收集的<tr>标签。我需要将每个标签分成单独的元素,以便我自己解析。

这可能吗?

标记的一个例子:

<tr class="first-in-year">
  <td class="year">2011</td>

  <td class="img"><a href="/battlefield-3/61-27006/"><img src=
  "http://media.giantbomb.com/uploads/6/63038/1700748-bf3_thumb.jpg" alt=""></a></td>

  <td class="title">
    <a href="/battlefield-3/61-27006/">Battlefield 3</a>

    <p class="deck">Battlefield 3 is DICE's next installment in the franchise and
    will be on PC, PS3 and Xbox 360. The game will feature jets, prone, a
    single-player and co-op campaign, and 64-player multiplayer (on PC). It's due out
    in Fall of 2011.</p>
  </td>

  <td class="date">Expected: Q4 2011</td>

  <td><a href="/pc/60-94/" class="PC">PC</a>, <a href="/xbox-360/60-20/" class=
  "X360">X360</a>, <a href="/playstation-3/60-35/" class="PS3">PS3</a></td>
</tr>

<tr>
  <td class="year"></td>

  <td class="img"><a href="/forza-motorsport-4/61-33400/"><img src=
  "http://media.giantbomb.com/uploads/0/1992/1654849-forza4_thumb.jpg" alt=
  ""></a></td>

  <td class="title">
    <a href="/forza-motorsport-4/61-33400/">Forza Motorsport 4</a>

    <p class="deck">The next installment of Turn 10's racing franchise slated for
    release in Fall 2011. It is set to feature 16 player online races, dynamic race
    conditions, cars from over 80 manufacturers, and compatibility with Kinect, both
    on and off the racetrack.</p>
  </td>

  <td class="date">Expected: Oct 2011</td>

  <td><a href="/xbox-360/60-20/" class="X360">X360</a></td>
</tr>

<tr>
  <td class="year"></td>

  <td class="img"><a href="/max-payne-3/61-23398/"><img src=
  "http://media.giantbomb.com/uploads/0/1400/938434-custom_1237811317319_mp3_poster_thumb.jpg"
  alt=""></a></td>

  <td class="title">
    <a href="/max-payne-3/61-23398/">Max Payne 3</a>

    <p class="deck">The long awaited third instalment in Remedy's beloved series, in
    which an aging Max Payne faces one final chance to redeem himself.</p>
  </td>

  <td class="date">Expected: 2011</td>

  <td><a href="/pc/60-94/" class="PC">PC</a>, <a href="/playstation-3/60-35/" class=
  "PS3">PS3</a>, <a href="/xbox-360/60-20/" class="X360">X360</a></td>
</tr>

因此,本例中我将有三个元素。 :)

1 个答案:

答案 0 :(得分:2)

如果这就是你的意思,你不能将它分成标签上的多个HTML文档。您可以选择单个TD元素并单独解析它们。

XPath选择器//td将选择您可以传递给解析方法的所有元素。

HtmlAgilityPack.HtmlDocument doc = LoadHtmlHowever();
doc.DocumentNode.SelectNodes("//td");