通过一些搜索,我能够弄清楚我试图刮去的是iframe内部。这是我总是收到的主要原因没有回来作为我的结果。我能够开始提取一些数据,如标题,但是当涉及到表格中的数据时,我只能获得第一个结果,即数字1.这是代码:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get('http://www.nhl.com/stats/player?aggregate=1&reportType=game&dateFrom=2017-10-20&dateTo=2017-10-31&filter=gamesPlayed,gte,1&sort=shots')
html = driver.page_source
driver.quit()
soup = BeautifulSoup(html,"html.parser")
stat_cat = soup.find('div',attrs={'class':'rt-tr'})
header = stat_cat.text.strip()
stats = soup.find('div',attrs={'class':'rt-td'})
player_stats = stats.text.strip()
print(header,player_stats)
我想弄清楚的是如何让玩家和他的统计数据从第二个汤中被刮掉。但它只返回第一个rt-td结果。一旦我拥有了所有数据,我就不想打印它而是将其保存到csv。谢谢你看看!
答案 0 :(得分:1)
试一试。如果您想从该表中获取所有数据,可以让它运行该脚本。
import csv
import requests
outfile = open("table_data.csv","a",newline='')
writer = csv.writer(outfile)
writer.writerow(["n","m","y","u"])
req = requests.get('http://www.nhl.com/stats/rest/skaters?isAggregate=true&reportType=basic&isGame=true&reportName=skatersummary&sort=[{%22property%22:%22shots%22,%22direction%22:%22DESC%22}]&cayenneExp=gameDate%3E=%222017-10-20%22%20and%20gameDate%3C=%222017-10-31%22%20and%20gameTypeId=2')
data = req.json()['data']
for item in data:
Player = item['playerName']
Pos = item['playerPositionCode']
GP = item['gamesPlayed']
G = item['goals']
A = item['assists']
P = item['points']
Plus_Minus = item['plusMinus']
PIM = item['penaltyMinutes']
PPG = item['ppGoals']
PPP = item['ppPoints']
SHG = item['shGoals']
SHP = item['shPoints']
GWG = item['gameWinningGoals']
OTG = item['otGoals']
S_down = item['shots']
S_per = item['shootingPctg']
TOI = item['timeOnIcePerGame']
Shifts = item['shiftsPerGame']
FOW = item['faceoffWinPctg']
print(Player,Pos,GP,G,A,P,Plus_Minus,PIM,PPG,PPP,SHG,SHP,GWG,OTG,S_down,S_per,TOI,Shifts,FOW)
writer.writerow([Player,Pos,GP,G,A,P,Plus_Minus,PIM,PPG,PPP,SHG,SHP,GWG,OTG,S_down,S_per,TOI,Shifts,FOW])
outfile.close()
部分结果:
Brent Burns D 6 0 5 5 -3 4 0 3 0 0 0 0 31 0.0 1458.8333 29.0 0.0
Max Pacioretty L 5 3 0 3 0 4 0 0 1 1 0 0 29 0.1034 1240.8 26.2 0.0
Phil Kessel R 6 2 4 6 -1 4 0 4 0 0 2 2 27 0.074 1044.3333 21.5 0.3333
Jakub Voracek R 5 2 4 6 2 8 0 0 0 0 0 0 26 0.0769 1191.2 25.4 1.0
John Carlson D 5 0 3 3 -3 2 0 1 0 0 0 0 25 0.0 1686.2 29.4 0.0
Evgeny Kuznetsov C 5 3 1 4 -1 6 0 1 0 0 1 0 24 0.125 1138.4 20.2 0.3703