我正在尝试删除此NBA网站https://stats.nba.com/team/1610612738/
。我想做的是为每个玩家提取玩家的姓名,NO,POS和所有信息。问题是我找不到或我的代码找不到表所在的<div ng-view>
的父级<nba-stat-table >
。
到目前为止,我的代码是:
from selenium import webdriver
from bs4 import BeautifulSoup
def get_Player():
driver = webdriver.PhantomJS(executable_path=r'D:\Documents\Python\Web Scraping\phantomjs.exe')
url = 'https://stats.nba.com/team/1610612738/'
driver.get(url)
data = driver.page_source.encode('utf-8')
soup = BeautifulSoup(data, 'lxml')
div1 = soup.find('div', class_="columns / small-12 / section-view-overlay")
print(div1.find_all('div'))
get_Player()
答案 0 :(得分:2)
使用页面用于获取该内容的json响应端点。更容易,更轻松地处理,并且不需要硒。您可以在“网络”标签中找到它。
import requests
import pandas as pd
r = requests.get('https://stats.nba.com/stats/commonteamroster?LeagueID=00&Season=2018-19&TeamID=1610612738', headers = {'User-Agent' : 'Mozilla/5.0'}).json()
players_info = r['resultSets'][0]
df = pd.DataFrame(players_info['rowSet'], columns = players_info['headers'])
print(df.head())
答案 1 :(得分:1)
find_all
函数始终返回列表,findChildren()
返回标签对象more details的所有子对象
替换您的代码:
div1 = soup.find('div', class_="columns / small-12 / section-view-overlay")
print(div1.find_all('div'))
收件人:
div = soup.find('div', {'class':"nba-stat-table__overflow"})
for tr in div.find("tbody").find_all("tr"):
for td in tr.findChildren():
print(td.text)
更新:
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def get_Player():
driver = webdriver.PhantomJS(executable_path=r'D:\Documents\Python\Web Scraping\phantomjs.exe')
url = 'https://stats.nba.com/team/1610612738/'
driver.get(url)
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "nba-stat-table__overflow")))
data = driver.page_source.encode('utf-8')
soup = BeautifulSoup(data, 'lxml')
div = soup.find('div', {'class':"nba-stat-table__overflow"})
for tr in div.find("tbody").find_all("tr"):
for td in tr.findChildren():
print(td.text)
get_Player()
O / P:
Jayson Tatum
Jayson Tatum
#0
F
6-8
208 lbs
MAR 03, 1998
21
1
Duke
Jonathan Gibson
Jonathan Gibson
#3
G
6-2
185 lbs
NOV 08, 1987
31
2
New Mexico State
....
答案 2 :(得分:0)
为什么要查找所有div's
,如果只是要提取的 Player 名称,则可以使用此{{1 }}:
css selector
代码:
td.player a