我正在尝试循环使用Python 2.7中的Beautiful Soup解析表的脚本。
第一个表解析工作并产生预期结果。第二个循环产生与第一个循环完全相同的结果 其他细节:
这是脚本:
import urllib2
import csv
from bs4 import BeautifulSoup # latest version bs4
week = raw_input("Which week?")
week = str(week)
data = []
first = "http://fantasy.nfl.com/research/projections#researchProjections=researchProjections%2C%2Fresearch%2Fprojections%253Foffset%253D"
middle = "%2526position%253DO%2526sort%253DprojectedPts%2526statCategory%253DprojectedStats%2526statSeason%253D2015%2526statType%253DweekProjectedStats%2526statWeek%253D"
last = "%2Creplace"
page_num = 1
for page_num in range(1,3):
page_mult = (page_num-1) * 25 +1
next = str(page_mult)
url = first + next + middle + week + last
print url #I added this in order to check my output
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html,"lxml")
table = soup.find('table', attrs={'class':'tableType-player hasGroups'})
table_body = table.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele]) # Get rid of empty values
b = open('NFLtable.csv', 'w')
a = csv.writer(b)
a.writerows(data)
b.close()
page_num =page_num+1
print data
答案 0 :(得分:1)
在实际页面上,他们使用AJAX来请求其他结果,并使用一些HTML作为值之一的JSON响应。
我稍微修改了你的代码,试一试:
import urllib2
import urllib
import csv
from bs4 import BeautifulSoup # latest version bs4
import json
week = raw_input("Which week?")
week = str(week)
data = []
url_format = "http://fantasy.nfl.com/research/projections?offset={offset}&position=O&sort=projectedPts&statCategory=projectedStats&statSeason=2015&statType=weekProjectedStats&statWeek={week}"
for page_num in range(1, 3):
page_mult = (page_num - 1) * 25 + 1
next = str(page_mult)
url = url_format.format(week=week, offset=page_mult)
print url # I added this in order to check my output
request = urllib2.Request(url, headers={'Ajax-Request': 'researchProjections'})
raw_json = urllib2.urlopen(request).read()
parsed_json = json.loads(raw_json)
html = parsed_json['content']
soup = BeautifulSoup(html, "html.parser")
table = soup.find('table', attrs={'class': 'tableType-player hasGroups'})
table_body = table.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele]) # Get rid of empty values
print data
我用周= 4测试。