Question

import urllib2

import pandas as pd
from bs4 import BeautifulSoup

x = 0
i = 1
data = []
while (i < 13):
    soup = BeautifulSoup(urllib2.urlopen(
        'http://games.espn.com/ffl/tools/projections?&slotCategoryId=4&scoringPeriodId=%d&seasonId=2018&startIndex=' % i, +str(x)).read(), 'html')
    tableStats = soup.find("table", ("class", "playerTableTable tableBody"))
    for row in tableStats.findAll('tr')[2:]:
        col = row.findAll('td')
        try:
            name = col[0].a.string.strip()
            opp = col[1].a.string.strip()
            rec = col[10].string.strip()
            yds = col[11].string.strip()
            dt = col[12].string.strip()
            pts = col[13].string.strip()
            data.append([name, opp, rec, yds, dt, pts])
        except Exception as e:
            pass
    df = pd.DataFrame(data=data, columns=[
                      'PLAYER', 'OPP', 'REC', 'YDS', 'TD', 'PTS'])
    df
    i += 1

我一直在使用幻想足球计划，并且试图在整个星期内增加数据，以便为每周前40名球员创建一个数据框。

通过在URL的PeriodId部分中手动输入星期数，我可以选择任意一周来获取它，但是我试图以编程方式在每个星期增加它，以使其更容易。我尝试使用PeriodId='+ I +'和PeriodId=%d，但是我不断遇到有关str和int并置以及错误操作数的各种错误。有什么建议或提示吗？

Answer 1

尝试删除%i和str(x)之间的逗号以连接字符串，看看是否有帮助。

soup = BeautifulSoup(urllib2.urlopen('http://games.espn.com/ffl/tools/projections?&slotCategoryId=4&scoringPeriodId=%d&seasonId=2018&startIndex='%i, +str(x)).read(), 'html')

应为：

soup = BeautifulSoup(urllib2.urlopen('http://games.espn.com/ffl/tools/projections?&slotCategoryId=4&scoringPeriodId=%d&seasonId=2018&startIndex='%i +str(x)).read(), 'html')

Answer 2

如果您在连接或格式化URL时遇到问题，请创建变量，而不要使用BeautifulSoup和urllib2.urlopen将其写入一行。

使用括号来格式化多个值，例如"before %s is %s" % (1, 0)

url = 'http://games.espn.com/ffl/tools/projections?&slotCategoryId=4&scoringPeriodId=%s&seasonId=2018&startIndex=%s' % (i, x)
# or
#url = 'http://games.espn.com/ffl/tools/projections?&slotCategoryId=4&scoringPeriodId=%s&seasonId=2018&startIndex=0' % i
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')

使代码分类器不会影响性能。

通过URL变量递增

2 个答案: