我有一个问题可以找到答案,但是编码的方式似乎有点笨拙,而且有点资源。想看看是否有一种方法在概念上我认为应该可行,但是无法正确编码。
问题出在以下代码上:
from bs4 import BeautifulSoup as bsoup
import requests as reqs
pagetoparse = 'https://fbref.com/en/squads/986a26c1/Northampton-Town'
page = reqs.get(pagetoparse)
status = page.status_code
parsepage = bsoup(page.content, 'html.parser')
playerlist = []
positionlist = []
agelist = []
# Create playerlist - unique instances
findplayers = parsepage.find_all('th',attrs={"data-stat":"player"})
for player in findplayers:
addplayer = player.find_next('a').get_text()
if addplayer not in playerlist and addplayer != 'coverage note':
playerlist.append(addplayer)
# Create positionlist - non-unique
findinfo = parsepage.find_all('td',attrs={"data-stat":'position'})
for position in findinfo:
addposition = position.get_text()
if addposition != 'coverage note':
positionlist.append(addposition)
# Create positionlist - non-unique
findinfo = parsepage.find_all('td',attrs={"data-stat":'age'})
for age in findinfo:
addage = age.get_text()
if addage != 'coverage note':
agelist.append(addage)
当前我正在执行的操作是可行的,但是问题是我希望在索引中运行整个data-stat选项:
toparse = ['玩家','位置','年龄']等
无论如何,我无法将其工作,然后将这些单独的索引成员添加到其各自的列表中。我可以构造一个执行此操作的for循环,但是它们最终都位于相同的索引中。在列表中运行data-stat变量时,可以帮助您将列表也更改为下一个吗?即代码将列表从玩家列表交换到位置列表等?
我已经设法单独运行代码来实现这一目标。但是,它缺乏灵活性,我想说它变得太麻烦了。
答案 0 :(得分:0)
使用find_next
函数获取下一个元素more details
from bs4 import BeautifulSoup as bsoup
import requests as reqs
pagetoparse = 'https://fbref.com/en/squads/986a26c1/Northampton-Town'
page = reqs.get(pagetoparse)
parsepage = bsoup(page.content, 'html.parser')
playerlist = []
findplayers = parsepage.find_all('th',attrs={"data-stat":"player"})
for player in findplayers:
playerdict = {}
addplayer = player.find_next('a').get_text()
if addplayer not in playerlist and addplayer != 'coverage note':
playerdict['player'] = addplayer
position,age = player.find_next('td'),player.find_next('td')
while True:
position = position.find_next('td')
if position.has_attr("data-stat") and position['data-stat'] in 'position':
playerdict['position'] = position.get_text()
break
while True:
position = position.find_next('td')
if position.has_attr("data-stat") and position['data-stat'] in 'age':
playerdict['age'] = position.get_text()
break
playerlist.append(playerdict)
print(playerlist)
O / P:
[{'player':'David Cornell','position':'GK','age':'27'},{'player':'David Cornell','position':'GK','age':'27'},
{'player':'Aaron Pierre','position':'DF','age':'25'},{'player':'Sam Hoskins','position':'FW','age':'25'},
{'player':'David Buchanan','position':'DF','age':'32'},{'player':'Sam Foley','position':'MF','age':'31'},
{'player':'Ash Taylor','position':'MF,DF','age':'27'},{'player':'Jordan Turnbull','position':'DF','age':'23'},
{'player':'Andy Williams','position':'MF,FW','age':'31'},{'player':"John-Joe O'Toole",'position':'MF','age':'29'},
{'player':'Shay Facey','position':'DF','age':'23'},{'player':'Shaun McWilliams','position':'MF','age':'19'},
{'player':'Kevin van Veen','position':'FW','age':'27'},{'player':'Matt Crooks','position':'MF,DF','age':'24'},
{'player':'Daniel Powell','position':'MF,FW','age':'27'},{'player':'Jack Bridge','position':'FW','age':'22'},
{'player':'Charlie Goode','position':'DF','age':'22'},{'player':'Hakeem Odoffin','position':'DF','age':'20'},
{'player':'Dean Bowditch','position':'FW','age':'32'},{'player':'Junior Morias','position':'FW','age':'23'},
{'player':'Jay Williams','position':'DF','age':''},{'player':'Joe Powell','position':'MF','age':'19'},
{'player':'Billy Waters','position':'MF,FW','age':'23'},{'player':'Marvin Sordell','position':'FW','age':'27'},
{'player':'Timi Elšnik','position':'MF','age':'20'},{'player':'Leon Barnett','position':'DF','age':'32'},
{'player':'Scott Pollock','position':'MF','age':''},{'player':'George Cox','position':'DF','age':''},
{'player':'Ryan Hughes','position':'MF','age':''},{'player':'Morgan Roberts','position':'','age':''},
{'player':'David Cornell','position':'GK','age':'27'},{'player':'David Cornell','position':'GK','age':'27'}]