我目前正在编写数据抓取程序以获取曲棍球统计数据。我从中获取信息的页面上只有一些统计信息,这些统计信息仅在您单击javascript按钮时可用,但是当我美化汤时,我可以看到所有数据都在 soup 变量中。这是我尝试访问的东西的一个示例(这是从打印(汤)中粘贴的副本)
<tr class="ALL5v5 hidden" ><th scope="row" class="left " data-append-csv="bozakty01" data-stat="player" csk="Bozak,Tyler" ><a href="/players/b/bozakty01.html">Tyler Bozak</a></th><td class="right " data-stat="Cevents" >0</td><td class="right " data-stat="on_Cevents" >4</td><td class="right " data-stat="on_opp_Cevents" >5</td><td class="right " data-stat="corsi_for" >44.4</td><td class="right " data-stat="corsi_rel" >-2.5</td><td class="right " data-stat="zs_off" >2</td><td class="right " data-stat="zs_def" >0</td><td class="right " data-stat="ozs_pct" >100.0</td><td class="right " data-stat="hits" >2</td><td class="right " data-stat="blocks" >0</td></tr>
<tr class="CL5v5 hidden" ><th scope="row" class="left " data-append-csv="bozakty01" data-stat="player" csk="Bozak,Tyler" ><a href="/players/b/bozakty01.html">Tyler Bozak</a></th><td class="right " data-stat="Cevents" >0</td><td class="right " data-stat="on_Cevents" >2</td><td class="right " data-stat="on_opp_Cevents" >4</td><td class="right " data-stat="corsi_for" >33.3</td><td class="right " data-stat="corsi_rel" >-10.7</td><td class="right " data-stat="zs_off" >0</td><td class="right " data-stat="zs_def" >0</td><td class="right " data-stat="ozs_pct" ></td><td class="right " data-stat="hits" >2</td><td class="right " data-stat="blocks" >0</td></tr>
要访问我尝试过的数据
soup.find_all('tr',{'data-stat': "on_Cevents"})
soup.find_all("tr", class_="ALL5v5 hidden")
soup.find_all({'data-stat': "Cevents"})
soup.find_all('td',{'data-stat': "Cevents"})
尽管所有信息都包含在 soup 变量中,但这些信息都无法访问。
我看不出问题所在。我正在使用以下命令
players = soup.find_all('td', {'data-stat': "player"})
访问播放器信息,效果很好。但是我无法访问上面列出的信息。
答案 0 :(得分:0)
我认为您在另一部分中有问题。请再次检查您的配置。 或它是由于您尚未向我们展示html结构的另一部分导致的。
这是我制作的脚本。
from bs4 import BeautifulSoup
html = '<tr class="ALL5v5 hidden" ><th scope="row" class="left " data-append-csv="bozakty01" data-stat="player" csk="Bozak,Tyler" ><a href="/players/b/bozakty01.html">Tyler Bozak</a></th><td class="right " data-stat="Cevents" >0</td><td class="right " data-stat="on_Cevents" >4</td><td class="right " data-stat="on_opp_Cevents" >5</td><td class="right " data-stat="corsi_for" >44.4</td><td class="right " data-stat="corsi_rel" >-2.5</td><td class="right " data-stat="zs_off" >2</td><td class="right " data-stat="zs_def" >0</td><td class="right " data-stat="ozs_pct" >100.0</td><td class="right " data-stat="hits" >2</td><td class="right " data-stat="blocks" >0</td></tr><tr class="CL5v5 hidden" ><th scope="row" class="left " data-append-csv="bozakty01" data-stat="player" csk="Bozak,Tyler" ><a href="/players/b/bozakty01.html">Tyler Bozak</a></th><td class="right " data-stat="Cevents" >0</td><td class="right " data-stat="on_Cevents" >2</td><td class="right " data-stat="on_opp_Cevents" >4</td><td class="right " data-stat="corsi_for" >33.3</td><td class="right " data-stat="corsi_rel" >-10.7</td><td class="right " data-stat="zs_off" >0</td><td class="right " data-stat="zs_def" >0</td><td class="right " data-stat="ozs_pct" ></td><td class="right " data-stat="hits" >2</td><td class="right " data-stat="blocks" >0</td></tr>'
soup = BeautifulSoup(html, "html.parser")
# print(soup)
print(soup.find_all('th', {'data-stat': "player"})) # I think 'th' is correct, not 'td'
print(soup.find_all('tr',{'data-stat': "on_Cevents"}))
print(soup.find_all("tr", class_="ALL5v5 hidden"))
print(soup.find_all({'data-stat': "Cevents"}))
print(soup.find_all('td',{'data-stat': "Cevents"}))
这是结果。我可以使用您尝试过的一些代码来访问数据。
[<th class="left " csk="Bozak,Tyler" data-append-csv="bozakty01" data-stat="player" scope="row"><a href="/players/b/bozakty01.html">Tyler Bozak</a></th>, <th class="left " csk="Bozak,Tyler" data-append-csv="bozakty01" data-stat="player" scope="row"><a href="/players/b/bozakty01.html">Tyler Bozak</a></th>]
[]
[<tr class="ALL5v5 hidden"><th class="left " csk="Bozak,Tyler" data-append-csv="bozakty01" data-stat="player" scope="row"><a href="/players/b/bozakty01.html">Tyler Bozak</a></th><td class="right " data-stat="Cevents">0</td><td class="right " data-stat="on_Cevents">4</td><td class="right " data-stat="on_opp_Cevents">5</td><td class="right " data-stat="corsi_for">44.4</td><td class="right " data-stat="corsi_rel">-2.5</td><td class="right " data-stat="zs_off">2</td><td class="right " data-stat="zs_def">0</td><td class="right " data-stat="ozs_pct">100.0</td><td class="right " data-stat="hits">2</td><td class="right " data-stat="blocks">0</td></tr>]
[]
[<td class="right " data-stat="Cevents">0</td>, <td class="right " data-stat="Cevents">0</td>]