我最终从代码中的URL尝试从页面中收集所有玩家名称。但是,当我使用.findAll来获取所有列表元素时,我还没有成功。请告知。
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
players_url = 'https://stats.nba.com/players/list/?Historic=Y'
# Opening up the Connection and grabbing the page
uClient = uReq(players_url)
page_html = uClient.read()
players_soup = soup(page_html, "html.parser")
# Taking all of the elements from the unordered lists that contains all of the players.
list_elements = players_soup.findAll('li', {'class': 'players-list__name'})
答案 0 :(得分:3)
如@Oluwafemi Sule所建议,最好将selenium
与BS
一起使用:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://stats.nba.com/players/list/?Historic=Y')
soup = BeautifulSoup(driver.page_source, 'lxml')
for div in soup.findAll('li', {'class': 'players-list__name'}):
print(div.find('a').contents[0])
输出:
Abdelnaby, Alaa
Abdul-Aziz, Zaid
Abdul-Jabbar, Kareem
Abdul-Rauf, Mahmoud
Abdul-Wahad, Tariq
等
答案 1 :(得分:1)
您可以通过直接从提供名称的js脚本中直接提取请求来完成此操作。
import requests
import json
r = requests.get('https://stats.nba.com/js/data/ptsd/stats_ptsd.js')
s = r.text.replace('var stats_ptsd = ','').replace('};','}')
data = json.loads(s)['data']['players']
players = [item[1] for item in data]
print(players)
答案 2 :(得分:0)
如评论中的@Oluwafemi Sule所述):
页面中生成的播放器列表是使用javascript完成的。
我建议您不要使用Selenium,而是由非常受欢迎的requests-html的作者创建的此软件包requests。它在后台使用Chromium渲染JavaScript内容。
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://stats.nba.com/players/list/?Historic=Y')
r.html.render()
for anchor in r.html.find('.players-list__name > a'):
print(anchor.text)
输出:
Abdelnaby, Alaa
Abdul-Aziz, Zaid
Abdul-Jabbar, Kareem
Abdul-Rauf, Mahmoud
Abdul-Wahad, Tariq
...