我正试图从英国广播公司体育网站上搜集结果。我已经得到了分数,但是当试图添加团队名称时,程序打印出没有1-0无(例如)。这是代码:
from bs4 import BeautifulSoup
import urllib.request
import csv
url = 'http://www.bbc.co.uk/sport/football/teams/derby-county/results'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page)
for match in soup.select('table.table-stats tr.report'):
team1 = match.find('span', class_='team-home')
team2 = match.find('span', class_='team-away')
score = match.abbr
print(team1.string, score.string, team2.string)
答案 0 :(得分:1)
看起来您正在搜索不存在的标签。例如,类_ =“团队主队”在html中,但是类_ ='team-home'不是。以下代码打印第一个团队名称:
tables = soup.find_all("table", class_="table-stats")
tables[0].find("span", class_="team-home teams").text
# u' Birmingham '
答案 1 :(得分:1)
这是一个可能的解决方案,通过BeautifulSoup获取主客场球队名称,最终得分,比赛日期和比赛名称,并将其放入DataFrame中。
import requests
import pandas as pd
from bs4 import BeautifulSoup
#Get the relevant webpage set the data up for parsing
url = "http://www.bbc.co.uk/sport/football/teams/derby-county/results"
r = requests.get(url)
soup=BeautifulSoup(r.content,"lxml")
#set up a function to parse the "soup" for each category of information and put it in a DataFrame
def get_match_info(soup,tag,class_name,column_name):
info_array=[]
for info in soup.find_all('%s'%tag,attrs={'class':'%s'%class_name}):
info_array.append({'%s'%column_name:info.text})
return pd.DataFrame(info_array)
#for each category pass the above function the relevant information i.e. tag names
date = get_match_info(soup,"td","match-date","Date")
home_team = get_match_info(soup,"span","team-home teams","Home Team")
score = get_match_info(soup,"span","score","Score")
away_team = get_match_info(soup,"span","team-away teams","Away Team")
competition = get_match_info(soup,"td","match-competition","Competition")
#Concatenate the DataFrames to present a final table of all the above info
match_info = pd.concat([date,home_team,score,away_team,competition],ignore_index=False,axis=1)
print match_info