使用漂亮的汤,但无法访问某些信息

时间:2018-06-27 01:53:58

标签: javascript python web-scraping beautifulsoup

我目前正在编写数据抓取程序以获取曲棍球统计数据。我从中获取信息的页面上只有一些统计信息,这些统计信息仅在您单击javascript按钮时可用,但是当我美化汤时,我可以看到所有数据都在 soup 变量中。这是我尝试访问的东西的一个示例(这是从打印(汤)中粘贴的副本)

<tr class="ALL5v5 hidden" ><th scope="row" class="left " data-append-csv="bozakty01" data-stat="player" csk="Bozak,Tyler" ><a href="/players/b/bozakty01.html">Tyler Bozak</a></th><td class="right " data-stat="Cevents" >0</td><td class="right " data-stat="on_Cevents" >4</td><td class="right " data-stat="on_opp_Cevents" >5</td><td class="right " data-stat="corsi_for" >44.4</td><td class="right " data-stat="corsi_rel" >-2.5</td><td class="right " data-stat="zs_off" >2</td><td class="right " data-stat="zs_def" >0</td><td class="right " data-stat="ozs_pct" >100.0</td><td class="right " data-stat="hits" >2</td><td class="right " data-stat="blocks" >0</td></tr>
<tr class="CL5v5 hidden" ><th scope="row" class="left " data-append-csv="bozakty01" data-stat="player" csk="Bozak,Tyler" ><a href="/players/b/bozakty01.html">Tyler Bozak</a></th><td class="right " data-stat="Cevents" >0</td><td class="right " data-stat="on_Cevents" >2</td><td class="right " data-stat="on_opp_Cevents" >4</td><td class="right " data-stat="corsi_for" >33.3</td><td class="right " data-stat="corsi_rel" >-10.7</td><td class="right " data-stat="zs_off" >0</td><td class="right " data-stat="zs_def" >0</td><td class="right " data-stat="ozs_pct" ></td><td class="right " data-stat="hits" >2</td><td class="right " data-stat="blocks" >0</td></tr>

要访问我尝试过的数据

soup.find_all('tr',{'data-stat': "on_Cevents"})
soup.find_all("tr", class_="ALL5v5 hidden")
soup.find_all({'data-stat': "Cevents"})
soup.find_all('td',{'data-stat': "Cevents"})
尽管所有信息都包含在 soup 变量中,但

这些信息都无法访问。

我看不出问题所在。我正在使用以下命令

players = soup.find_all('td', {'data-stat': "player"})

访问播放器信息,效果很好。但是我无法访问上面列出的信息。

1 个答案:

答案 0 :(得分:0)

我认为您在另一部分中有问题。请再次检查您的配置。 或它是由于您尚未向我们展示html结构的另一部分导致的。

这是我制作的脚本。

from bs4 import BeautifulSoup

html = '<tr class="ALL5v5 hidden" ><th scope="row" class="left " data-append-csv="bozakty01" data-stat="player" csk="Bozak,Tyler" ><a href="/players/b/bozakty01.html">Tyler Bozak</a></th><td class="right " data-stat="Cevents" >0</td><td class="right " data-stat="on_Cevents" >4</td><td class="right " data-stat="on_opp_Cevents" >5</td><td class="right " data-stat="corsi_for" >44.4</td><td class="right " data-stat="corsi_rel" >-2.5</td><td class="right " data-stat="zs_off" >2</td><td class="right " data-stat="zs_def" >0</td><td class="right " data-stat="ozs_pct" >100.0</td><td class="right " data-stat="hits" >2</td><td class="right " data-stat="blocks" >0</td></tr><tr class="CL5v5 hidden" ><th scope="row" class="left " data-append-csv="bozakty01" data-stat="player" csk="Bozak,Tyler" ><a href="/players/b/bozakty01.html">Tyler Bozak</a></th><td class="right " data-stat="Cevents" >0</td><td class="right " data-stat="on_Cevents" >2</td><td class="right " data-stat="on_opp_Cevents" >4</td><td class="right " data-stat="corsi_for" >33.3</td><td class="right " data-stat="corsi_rel" >-10.7</td><td class="right " data-stat="zs_off" >0</td><td class="right " data-stat="zs_def" >0</td><td class="right " data-stat="ozs_pct" ></td><td class="right " data-stat="hits" >2</td><td class="right " data-stat="blocks" >0</td></tr>'

soup = BeautifulSoup(html, "html.parser")

# print(soup)

print(soup.find_all('th', {'data-stat': "player"})) # I think 'th' is correct, not 'td'
print(soup.find_all('tr',{'data-stat': "on_Cevents"}))
print(soup.find_all("tr", class_="ALL5v5 hidden"))
print(soup.find_all({'data-stat': "Cevents"}))
print(soup.find_all('td',{'data-stat': "Cevents"}))

这是结果。我可以使用您尝试过的一些代码来访问数据。

[<th class="left " csk="Bozak,Tyler" data-append-csv="bozakty01" data-stat="player" scope="row"><a href="/players/b/bozakty01.html">Tyler Bozak</a></th>, <th class="left " csk="Bozak,Tyler" data-append-csv="bozakty01" data-stat="player" scope="row"><a href="/players/b/bozakty01.html">Tyler Bozak</a></th>]
[]
[<tr class="ALL5v5 hidden"><th class="left " csk="Bozak,Tyler" data-append-csv="bozakty01" data-stat="player" scope="row"><a href="/players/b/bozakty01.html">Tyler Bozak</a></th><td class="right " data-stat="Cevents">0</td><td class="right " data-stat="on_Cevents">4</td><td class="right " data-stat="on_opp_Cevents">5</td><td class="right " data-stat="corsi_for">44.4</td><td class="right " data-stat="corsi_rel">-2.5</td><td class="right " data-stat="zs_off">2</td><td class="right " data-stat="zs_def">0</td><td class="right " data-stat="ozs_pct">100.0</td><td class="right " data-stat="hits">2</td><td class="right " data-stat="blocks">0</td></tr>]
[]
[<td class="right " data-stat="Cevents">0</td>, <td class="right " data-stat="Cevents">0</td>]