所以我今天花了一些时间试图破解这个问题,重新写这个问题,我觉得到目前为止我做得还不错。
我有一个足球结果数据库,以此为头(3)
Date Season home visitor FT hgoal vgoal division tier totgoal goaldif result
1993-04-12 1992 Arsenal Aston Villa 0-1 0 1 1 1 1 -1 A
1992-09-12 1992 Arsenal Blackburn Rovers 0-1 0 1 1 1 1 -1 A
1992-10-03 1992 Arsenal Chelsea 2-1 2 1 1 1 3 1 H
我已经编写了这段代码:
def my_table(season) :
teams = season['home'].unique().tolist()
table = []
for team in teams :
home = season[season['home'] == team]['result']
hseq = dict(zip(*np.unique(home, return_counts=True)))
away = season[season['visitor'] == team]['result']
aseq = dict(zip(*np.unique(away, return_counts=True)))
team_dict = {
"season" : season.iloc[0]['Season'],
"team" : team,
"home_pl" : sum(hseq.values()),
"home_w" : hseq.get('H', 0),
"home_d" : hseq.get('D', 0),
"home_l" : hseq.get('A', 0),
"home_gf" : season[season['home'] == team]['hgoal'].sum(),
"home_ga" : season[season['home'] == team]['vgoal'].sum(),
"home_gd" : season[season['home'] == team]['goaldif'].sum(),
"home_pts" : hseq.get('H', 0) * 3 + hseq.get('D', 0),
"away_pl" : sum(aseq.values()),
"away_w" : aseq.get('A', 0),
"away_d" : aseq.get('D', 0),
"away_l" : aseq.get('H', 0),
"away_gf" : season[season['visitor'] == team]['vgoal'].sum(),
"away_ga" : season[season['visitor'] == team]['hgoal'].sum(),
"away_gd" : (season[season['visitor'] == team]['goaldif'].sum() * -1),
"away_pts" : aseq.get('A', 0) * 3 + hseq.get('D', 0)
}
team_dict["pl"] = team_dict["home_pl"] + team_dict['away_pl']
team_dict["w"] = team_dict["home_w"] + team_dict['away_w']
team_dict["d"] = team_dict["home_d"] + team_dict['away_d']
team_dict["l"] = team_dict["home_l"] + team_dict['away_l']
team_dict["gf"] = team_dict["home_gf"] + team_dict['away_gf']
team_dict["ga"] = team_dict["home_ga"] + team_dict['away_ga']
team_dict["gd"] = team_dict["home_gd"] + team_dict['away_gd']
team_dict["pts"] = team_dict["home_pts"] + team_dict['away_pts']
table.append(team_dict)
return table
seasons = pl['Season'].unique().tolist()
all_tables = []
for season in seasons :
table = my_table(pl[pl['Season'] == season])
all_tables += table
tbl = pd.DataFrame(all_tables)
away = ['away_pl', 'away_w', 'away_d', 'away_l', 'away_gf', 'away_ga', 'away_gd', 'away_pts']
home = ['home_pl', 'home_w', 'home_d', 'home_l', 'home_gf', 'home_ga', 'home_gd', 'home_pts']
full = ['pl', 'w', 'd', 'l', 'gf', 'ga', 'gd', 'pts']
team = ['team']
tbl = tbl[['season', 'team']+home+away+full]
所以现在' tbl'很好,我可以按季节索引它。但我很难将它变成一个多指数,而这个指数已经过了一个季节。首先,然后按他们的积分(降序),相当于他们的联赛终点位置。为了清楚起见,我希望索引为1-20(或1-22),但索引由点总数驱动。
此外,如果有人对我自己如何建造桌子有任何想法,我很乐意听到它。我花了很长时间尝试使用各种矢量化函数,我告诉它们更有效但却无法使其工作并恢复为循环。
谢谢
答案 0 :(得分:0)
考虑使用GroupBy.rank或Series.rank按降序排列pts
来计算团队。由于我无法确定您的最终数据框是在季节,团队还是游戏级别,因此选择适当的排名:
tbl['team_rank'] = tbl.groupby(['season', 'team'])['pts'].rank(ascending=False)
tbl['team_rank'] = tbl['pts'].rank(ascending=False)
然后在多索引的字段对上使用set_index
,而无需事先排序。
tbl = tbl.set_index(['season', 'team_rank'])
但是,由于您需要多个字段进行排名,因此请考虑使用reset_index
然后检索index.values
以获取有序编号(+ 1
,如果您不想从零开始):
tbl = tbl.sort_values(['season', 'pts', 'gd', 'gf'],
ascending=[True, False, False, False]).reset_index(drop=True)
tbl['rank'] = tbl.index.values + 1
tbl = tbl.set_index(['season', 'rank'])
答案 1 :(得分:0)
这就是我使用上面的代码使其工作的方式......
team_count = tbl.groupby(['season'])['team'].count().tolist()
rank_column = []
for i in team_count :
j = list(range(1,i+1,1))
rank_column += j
tbl = tbl.sort_values(['season', 'pts', 'gd', 'gf'], ascending=[True, False, False, False])
tbl['rank'] = rank_column
tbl = tbl.set_index(['season', 'rank'])
不确定这是否是最有效的方式,但它有效吗?