我正在使用Daniel Rodriguez的略微编辑的代码。我试图从2014年获得所有NBA得分数据。这个代码有两个先前的部分:第一个抓住所有的球队名称,第二个抓住所有具有ESPN游戏ID,日期,主队的球队的比赛,主场比分,客场球队和客场得分。这两部分工作得很好。
然后我尝试运行从游戏ID中获取游戏的所有盒子比分数据的部分。它适用于大部分游戏,然后几乎随机停在游戏上并给出错误:
AttributeError:' NoneType'对象没有属性' find_all'
我随机说,因为我一遍又一遍地运行相同的代码,它永远不会停在相同的盒子分数上。每次都会在不同的盒子分数上出错。
这是代码(**行是发生错误的地方):
import numpy as np
import pandas as pd
import requests
import time
from bs4 import BeautifulSoup
import os
os.chdir('C:\Users\steven2r\Documents\Python')
games = pd.read_csv('games.csv').set_index('id')
BASE_URL = 'http://espn.go.com/nba/boxscore?gameId={0}'
request = requests.get(BASE_URL.format(games.index[0]))
table = BeautifulSoup(request.text).find('table', class_='mod-data')
heads = table.find_all('thead')
headers = heads[0].find_all('tr')[1].find_all('th')[1:]
headers = [th.text for th in headers]
columns = ['id', 'team', 'player'] + headers
bad_downloads = []
players = pd.DataFrame(columns=columns)
def get_players(players, team_name):
array = np.zeros((len(players), len(headers)+1), dtype=object)
array[:] = np.nan
for i, player in enumerate(players):
cols = player.find_all('td')
array[i, 0] = cols[0].text.split(',')[0]
for j in range(1, len(headers) + 1):
if not cols[1].text.startswith('DNP'):
array[i, j] = cols[j].text
frame = pd.DataFrame(columns=columns)
for x in array:
line = np.concatenate(([index, team_name], x)).reshape(1,len(columns))
new = pd.DataFrame(line, columns=frame.columns)
frame = frame.append(new)
return frame
for index, row in games.iterrows():
print(index)
request = requests.get(BASE_URL.format(index))
table = BeautifulSoup(request.text).find('table', class_='mod-data')
if table == []:
print index, 'bad'
bad_downloads.append(index)
else:
heads = table.find_all('thead')
bodies = table.find_all('tbody')
team_1 = heads[0].th.text
team_1_players = bodies[0].find_all('tr') + bodies[1].find_all('tr')
team_1_players = get_players(team_1_players, team_1)
players = players.append(team_1_players)
team_2 = heads[3].th.text
team_2_players = bodies[3].find_all('tr') + bodies[4].find_all('tr')
team_2_players = get_players(team_2_players, team_2)
players = players.append(team_2_players)
players = players.set_index('id')
print(players)
players.to_csv('players.csv')
print bad_downloads
答案 0 :(得分:0)
参见Problems Parsing NBA Boxscore Data with BeautifulSoup 看来BeautifulSoup与ESPN并不完全兼容。上面的链接提供了另一种解决方案。