我正在使用Python Beautifulsoup从以下URL'https://www.pro-football-reference.com/teams/nwe/2013_injuries.htm'进行网络抓取。 我想从URL上刮除玩家的姓名,他们的受伤情况以及受伤的星期。 我可以抓取第一周的信息,显示以下结果:
[['Danny Amendola'], 'Questionable: hamstring', 'week_1']
[['Armond Armstead'], 'Out: infection', 'week_1']
[['Kyle Arrington'], 'NA', 'week_1']
[['Brandon Bolden'], 'Questionable: knee', 'week_1']
... and so on for all the week 1 injuries.
但是一旦显示了第1周的所有伤害,它将停止。
我希望结果能够持续到第2周,第3周,第4周...等。
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.pro-football-reference.com/teams/nwe/2013_injuries.htm'
# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
# html parsing
page_soup = soup(page_html, "html.parser")
containers = page_soup.find("tbody")
head = page_soup.find("thead")
player = containers.find_all("tr")
for tr in player:
th = tr.find_all("th")
name = [i.text for i in th]
week = tr.td["data-stat"]
try:
injury = tr.td["data-tip"]
print([name, injury, week])
except KeyError:
injury = "NA"
print([name, injury, week])
我要寻找的结果是该代码可打印URL中表中显示的所有星期中球员的姓名,他们的受伤情况以及受伤一周。 例如,一旦打印了所有第1周的伤害,我希望它显示所有第2周和第3周的伤害,依此类推。 所以看起来像这样:
[['Adrian Wilson'], 'Injured Reserve: hamstring', 'week_1']
[['Tavon Wilson'], 'NA', 'week_1']
[['Markus Zusevics'], 'Injured Reserve: undisclosed', 'week_1']
[['Danny Amendola'], 'Questionable: groin', 'week_2']
...
答案 0 :(得分:1)
您仅迭代数据提示的第一个实例,这应该可以工作:
player = containers.find_all("tr")
for tr in player:
th = tr.find_all("th")
name = [i.text for i in th]
for td in tr.findAll('td'):
week = td["data-stat"]
try:
injury = td["data-tip"]
print([name, injury, week])
except KeyError:
injury = "NA"
print([name, injury, week])
答案 1 :(得分:0)
Despawner
答案 2 :(得分:0)
代码:
a = [doug, dofug] b = [goud, doaaug]
输出:
import re
import requests
from bs4 import BeautifulSoup as soup
html = requests.get('https://www.pro-football-reference.com/teams/nwe/2013_injuries.htm').text
overall = []
page_soup = soup(html, 'html.parser')
containers = page_soup.find('tbody')
players = containers.find_all('tr')
for player in players:
th = player.find_all('th')
name = [i.text for i in th]
tds = player.find_all('td', {'class': re.compile('^center poptip')})
weeklyInjuries = ', '.join([', '.join(i) for i in [list(a) for a in zip([i['data-tip'] for i in tds], [i['data-stat'] for i in tds])]])
if len(weeklyInjuries) == 0:
weeklyInjuries = 'N/A'
print([name, weeklyInjuries])