如何在每个选定节点之后读取标记后面的'n'

时间:2015-08-16 16:25:36

标签: vb.net xpath html-agility-pack

我正在尝试使用VB .NET和HTML Agility Pack(HAP)从以下HTML表中获取每个玩家的统计数据,但我不知道如何在每个玩家行之后选择标记。

    <table class="stats" cellspacing="0">
   <tr class="statsgreen">
      <td colspan="10" class="estverdel">Team A</td>
      <td colspan="2">REB</td>
      <td colspan="4">&nbsp;</td>
      <td colspan="2">BLK</td>
      <td>&nbsp;</td>
      <td colspan="2">PF</td>
      <td>&nbsp;</td>
      <td>&nbsp;</td>
   </tr>
   <tr class="statsgreen">
      <td>Num</td>
      <td>Name</td>
      <td>Min</td>
      <td>GS</td>
      <td>T2</td>
      <td>T2 %</td>
      <td>T3</td>
      <td>T3 %</td>
      <td>T1</td>
      <td>T1 %</td>
      <td>T</td>
      <td>D+O</td>
      <td>A</td>
      <td>ST</td>
      <td>LO</td>
      <td>C</td>
      <td>R</td>
      <td>C</td>
      <td>M</td>
      <td>R</td>
      <td>C</td>
      <td>+/-</td>
      <td>PIE</td>
   </tr>   
    <tr>
      <td>6</td>
      <td><a href="/player.php?id=001">Player 1</a></td>
      <td>30:22</td>
      <td>18</td>
      <td>4/10</td>
      <td>40%</td>
      <td>2/6</td>
      <td>33%</td>
      <td>4/4</td>
      <td>100%</td>
      <td>9</td>
      <td>5+4</td>
      <td>1</td>
      <td>1</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>3</td>
      <td>4</td>
      <td>10</td>
      <td>20</td>
   </tr>
   <tr>
      <td>6</td>
      <td><a href="/player.php?id=002">Player 2</a></td>
      <td>30:22</td>
      <td>18</td>
      <td>4/10</td>
      <td>40%</td>
      <td>2/6</td>
      <td>33%</td>
      <td>4/4</td>
      <td>100%</td>
      <td>9</td>
      <td>5+4</td>
      <td>1</td>
      <td>1</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>3</td>
      <td>4</td>
      <td>10</td>
      <td>20</td>
   </tr>
   ...
   ...
   <tr class="statsgreen">
      <td colspan="10" class="estverdel">Team B</td>
      <td colspan="2">REB</td>
      <td colspan="4">&nbsp;</td>
      <td colspan="2">BLK</td>
      <td>&nbsp;</td>
      <td colspan="2">PF</td>
      <td>&nbsp;</td>
      <td>&nbsp;</td>
   </tr>
   <tr class="statsgreen">
      <td>Num</td>
      <td>Name</td>
      <td>Min</td>
      <td>GS</td>
      <td>T2</td>
      <td>T2 %</td>
      <td>T3</td>
      <td>T3 %</td>
      <td>T1</td>
      <td>T1 %</td>
      <td>T</td>
      <td>D+O</td>
      <td>A</td>
      <td>ST</td>
      <td>LO</td>
      <td>C</td>
      <td>R</td>
      <td>C</td>
      <td>M</td>
      <td>R</td>
      <td>C</td>
      <td>+/-</td>
      <td>PIE</td>
   </tr>   
    <tr>
      <td>6</td>
      <td><a href="/player.php?id=013">Player 13</a></td>
      <td>30:22</td>
      <td>18</td>
      <td>4/10</td>
      <td>40%</td>
      <td>2/6</td>
      <td>33%</td>
      <td>4/4</td>
      <td>100%</td>
      <td>9</td>
      <td>5+4</td>
      <td>1</td>
      <td>1</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>3</td>
      <td>4</td>
      <td>10</td>
      <td>20</td>
   </tr>
</table>

这是我在VB.NET中的不完整代码,仅获取团队和玩家名称:

Private Sub btnGetStats_Click(sender As Object, e As EventArgs) Handles btnGetStats.Click
    Dim doc As New HtmlDocument                    
    doc.Load("C:\001.html")

    'Get team names  
    For Each nodeteams As HtmlNode In doc.DocumentNode.SelectNodes("//td[@class=""estverdel""]")                    
        MessageBox.Show("Team: " + nodeteams.InnerText)                
    Next

    'Get player names
    For Each nodeplayers As HtmlNode In doc.DocumentNode.SelectNodes("//a[contains(@href, '/player')]")
        MessageBox.Show(nodeplayers.InnerText)    
    Next
End Sub

是否有任何XPATH句子可用于选择每个玩家节点,然后通过以下21个统计字段中的每一个?

作为替代方案,我想我可以获取nodeplayers.line,然后使用System.IO.StreamReader读取以下21行,但也许HAP可以通过智能方式完成。

1 个答案:

答案 0 :(得分:0)

一种可能性是使用ParentNode玩家对象的HtmlNode属性:

  • 获取找到的播放器节点的父节点的父节点(来自<tr><td><a player>...的tr节点)
  • 获取所有子节点(所有td个节点)
  • 使用LINQ Skip(数字和播放器链接)
  • 跳过前两个子节点
  • 接受剩余的子节点

修改你的第二个循环:

   'Get player names
   for each nodeplayers as HtmlNode in doc.DocumentNode.SelectNodes("//a[contains(@href, '/player')]")
        Console.WriteLine("Player: " + nodeplayers.InnerText)
        ' select parent node (tr) of player (a) parent node (td), skip first two and take the rest 
        for each node as HtmlNode in nodeplayers.ParentNode.ParentNode.ChildNodes.Skip(2).ToList()
            Console.WriteLine(node.InnerText)
        next
   next

返回每个玩家的所有值:

Team: Team A
Team: Team B
Player: Player 1   
30:22  
18
4/10 
40% 
2/6
33% 
4/4
100% 
9 
5+4  
1  
1 
0    
0   
0  
0    
0    
3    
4    
10    
20
...