我正在分析一个足球比赛数据集,我想回答一个问题-每支球队进球和输进多少球。
我的数据集:
date home_team away_team home_score away_score
1873-03-08 England Scotland 0 1
1873-03-09 Scotland England 1 0
... ... ... ... ...
该函数接受2个参数-开始年份和结束年份
我尝试在开始时有一个空列表,同时在整个集合中进行迭代以添加国家/地区的名称,并附加他们已经得分的目标,但是由于有很多不同的团队,我的列表是不正确的。 / p>
def total_goals(start, end):
x = 0
goals_scored = 0
goals_scored_list = []
goals_lost = 0
goals_lost_list = []
complete_list = []
for item in range(len(data['home_team'])):
date = int(data['date'][x][:4])
if date >= start:
if date <= end:
if int(data['home_score'][x]) > int(data['away_score'][x]):
goals_scored_list.append(data['home_team'])
goals_scored_list.append(data['home_score'])
x += 1
else:
x += 1
return goals_scored_list
我想要的输出将是一个列表,其中将包含每个唯一团队的列表,该列表将包含国家/地区名称,进球数和丢失的目标:
[['England',1,1],['Scotland',0,2],[...]]
我认为我需要为每个独特的国家/地区创建一个列表,也许使用类似
if country not in data['home_team']:
goals_scored_list.append(data['home_team'][x]
但是我相信有一种更复杂的方法可以实现我的目标。
答案 0 :(得分:0)
我认为这应该有效:
class Team:
def __init__(self,name):
self.name = name
self.wins = 0
self.losses = 0
def addEl(self,pos,score):
try:
score = int(score)
except Exception as e:
print e
if pos:
self.wins += score
else:
self.loss += score
def total_goals(start,end):
d = {}
for i in range(len(data)):
date = int(data['date'][i])
if date >= start and date <= end: #make sure it's in the params
pos = int(data['home_score'][i]) > int(data['away_score'][i]) #true if home wins, false otherwise
if data['home_team'][i] in d: #check if home_team already exists
d[data['home_team']].addEl(pos,data['home_score'][i]) #add score to wins/losses
else:
d[data['home_team'][i]] = Team(data['home_score'][i])
d[data['home_team'][i]].addEl(pos,data['home_score'][i])
if data['away_team'][i] in d:
d[data['away_team']].addEl(not(pos),data['away_score'][i])
else:
d[data['away_team'][i]] = Team(data['away_score'][i])
d[data['away_team'][i]].addEl(not(pos),data['away_score'][i])
return d
使用自定义类的优势在于,您可以添加更多特征,例如赢/输了多少局游戏,其他统计信息等。