import urllib2
import pandas as pd
from bs4 import BeautifulSoup
x = 0
i = 1
data = []
while (i < 13):
soup = BeautifulSoup(urllib2.urlopen(
'http://games.espn.com/ffl/tools/projections?&slotCategoryId=4&scoringPeriodId=%d&seasonId=2018&startIndex=' % i, +str(x)).read(), 'html')
tableStats = soup.find("table", ("class", "playerTableTable tableBody"))
for row in tableStats.findAll('tr')[2:]:
col = row.findAll('td')
try:
name = col[0].a.string.strip()
opp = col[1].a.string.strip()
rec = col[10].string.strip()
yds = col[11].string.strip()
dt = col[12].string.strip()
pts = col[13].string.strip()
data.append([name, opp, rec, yds, dt, pts])
except Exception as e:
pass
df = pd.DataFrame(data=data, columns=[
'PLAYER', 'OPP', 'REC', 'YDS', 'TD', 'PTS'])
df
i += 1
我一直在使用幻想足球计划,并且试图在整个星期内增加数据,以便为每周前40名球员创建一个数据框。
通过在URL的PeriodId
部分中手动输入星期数,我可以选择任意一周来获取它,但是我试图以编程方式在每个星期增加它,以使其更容易。我尝试使用PeriodId='+ I +'
和PeriodId=%d
,但是我不断遇到有关str和int并置以及错误操作数的各种错误。有什么建议或提示吗?
答案 0 :(得分:0)
尝试删除%i
和str(x)
之间的逗号以连接字符串,看看是否有帮助。
soup = BeautifulSoup(urllib2.urlopen('http://games.espn.com/ffl/tools/projections?&slotCategoryId=4&scoringPeriodId=%d&seasonId=2018&startIndex='%i, +str(x)).read(), 'html')
应为:
soup = BeautifulSoup(urllib2.urlopen('http://games.espn.com/ffl/tools/projections?&slotCategoryId=4&scoringPeriodId=%d&seasonId=2018&startIndex='%i +str(x)).read(), 'html')
答案 1 :(得分:0)
如果您在连接或格式化URL时遇到问题,请创建变量,而不要使用BeautifulSoup
和urllib2.urlopen
将其写入一行。
使用括号来格式化多个值,例如"before %s is %s" % (1, 0)
url = 'http://games.espn.com/ffl/tools/projections?&slotCategoryId=4&scoringPeriodId=%s&seasonId=2018&startIndex=%s' % (i, x)
# or
#url = 'http://games.espn.com/ffl/tools/projections?&slotCategoryId=4&scoringPeriodId=%s&seasonId=2018&startIndex=0' % i
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
使代码分类器不会影响性能。