我正在尝试使用VB .NET和HTML Agility Pack(HAP)从以下HTML表中获取每个玩家的统计数据,但我不知道如何在每个玩家行之后选择标记。
<table class="stats" cellspacing="0">
<tr class="statsgreen">
<td colspan="10" class="estverdel">Team A</td>
<td colspan="2">REB</td>
<td colspan="4"> </td>
<td colspan="2">BLK</td>
<td> </td>
<td colspan="2">PF</td>
<td> </td>
<td> </td>
</tr>
<tr class="statsgreen">
<td>Num</td>
<td>Name</td>
<td>Min</td>
<td>GS</td>
<td>T2</td>
<td>T2 %</td>
<td>T3</td>
<td>T3 %</td>
<td>T1</td>
<td>T1 %</td>
<td>T</td>
<td>D+O</td>
<td>A</td>
<td>ST</td>
<td>LO</td>
<td>C</td>
<td>R</td>
<td>C</td>
<td>M</td>
<td>R</td>
<td>C</td>
<td>+/-</td>
<td>PIE</td>
</tr>
<tr>
<td>6</td>
<td><a href="/player.php?id=001">Player 1</a></td>
<td>30:22</td>
<td>18</td>
<td>4/10</td>
<td>40%</td>
<td>2/6</td>
<td>33%</td>
<td>4/4</td>
<td>100%</td>
<td>9</td>
<td>5+4</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>3</td>
<td>4</td>
<td>10</td>
<td>20</td>
</tr>
<tr>
<td>6</td>
<td><a href="/player.php?id=002">Player 2</a></td>
<td>30:22</td>
<td>18</td>
<td>4/10</td>
<td>40%</td>
<td>2/6</td>
<td>33%</td>
<td>4/4</td>
<td>100%</td>
<td>9</td>
<td>5+4</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>3</td>
<td>4</td>
<td>10</td>
<td>20</td>
</tr>
...
...
<tr class="statsgreen">
<td colspan="10" class="estverdel">Team B</td>
<td colspan="2">REB</td>
<td colspan="4"> </td>
<td colspan="2">BLK</td>
<td> </td>
<td colspan="2">PF</td>
<td> </td>
<td> </td>
</tr>
<tr class="statsgreen">
<td>Num</td>
<td>Name</td>
<td>Min</td>
<td>GS</td>
<td>T2</td>
<td>T2 %</td>
<td>T3</td>
<td>T3 %</td>
<td>T1</td>
<td>T1 %</td>
<td>T</td>
<td>D+O</td>
<td>A</td>
<td>ST</td>
<td>LO</td>
<td>C</td>
<td>R</td>
<td>C</td>
<td>M</td>
<td>R</td>
<td>C</td>
<td>+/-</td>
<td>PIE</td>
</tr>
<tr>
<td>6</td>
<td><a href="/player.php?id=013">Player 13</a></td>
<td>30:22</td>
<td>18</td>
<td>4/10</td>
<td>40%</td>
<td>2/6</td>
<td>33%</td>
<td>4/4</td>
<td>100%</td>
<td>9</td>
<td>5+4</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>3</td>
<td>4</td>
<td>10</td>
<td>20</td>
</tr>
</table>
这是我在VB.NET中的不完整代码,仅获取团队和玩家名称:
Private Sub btnGetStats_Click(sender As Object, e As EventArgs) Handles btnGetStats.Click
Dim doc As New HtmlDocument
doc.Load("C:\001.html")
'Get team names
For Each nodeteams As HtmlNode In doc.DocumentNode.SelectNodes("//td[@class=""estverdel""]")
MessageBox.Show("Team: " + nodeteams.InnerText)
Next
'Get player names
For Each nodeplayers As HtmlNode In doc.DocumentNode.SelectNodes("//a[contains(@href, '/player')]")
MessageBox.Show(nodeplayers.InnerText)
Next
End Sub
是否有任何XPATH句子可用于选择每个玩家节点,然后通过以下21个统计字段中的每一个?
作为替代方案,我想我可以获取nodeplayers.line,然后使用System.IO.StreamReader读取以下21行,但也许HAP可以通过智能方式完成。
答案 0 :(得分:0)
一种可能性是使用ParentNode
玩家对象的HtmlNode
属性:
<tr><td><a player>...
的tr节点)td
个节点)修改你的第二个循环:
'Get player names
for each nodeplayers as HtmlNode in doc.DocumentNode.SelectNodes("//a[contains(@href, '/player')]")
Console.WriteLine("Player: " + nodeplayers.InnerText)
' select parent node (tr) of player (a) parent node (td), skip first two and take the rest
for each node as HtmlNode in nodeplayers.ParentNode.ParentNode.ChildNodes.Skip(2).ToList()
Console.WriteLine(node.InnerText)
next
next
返回每个玩家的所有值:
Team: Team A
Team: Team B
Player: Player 1
30:22
18
4/10
40%
2/6
33%
4/4
100%
9
5+4
1
1
0
0
0
0
0
3
4
10
20
...