足球社交者的DataFrame到联盟表

时间:2017-03-23 13:29:09

标签: python pandas numpy

所以我今天花了一些时间试图破解这个问题,重新写这个问题,我觉得到目前为止我做得还不错。

我有一个足球结果数据库,以此为头(3)

      Date Season     home           visitor   FT  hgoal  vgoal  division  tier  totgoal  goaldif result
1993-04-12   1992  Arsenal       Aston Villa  0-1      0      1         1     1        1       -1      A  
1992-09-12   1992  Arsenal  Blackburn Rovers  0-1      0      1         1     1        1       -1      A  
1992-10-03   1992  Arsenal           Chelsea  2-1      2      1         1     1        3        1      H

我已经编写了这段代码:

def my_table(season) :
    teams = season['home'].unique().tolist()
    table = []
    for team in teams :
        home = season[season['home'] == team]['result']
        hseq = dict(zip(*np.unique(home, return_counts=True)))

        away = season[season['visitor'] == team]['result']
        aseq = dict(zip(*np.unique(away, return_counts=True)))

        team_dict = {
            "season"   : season.iloc[0]['Season'],
            "team"     : team,
            "home_pl"  : sum(hseq.values()),            
            "home_w"   : hseq.get('H', 0),
            "home_d"   : hseq.get('D', 0),
            "home_l"   : hseq.get('A', 0),
            "home_gf"  : season[season['home'] == team]['hgoal'].sum(),
            "home_ga"  : season[season['home'] == team]['vgoal'].sum(),
            "home_gd"  : season[season['home'] == team]['goaldif'].sum(),
            "home_pts" : hseq.get('H', 0) * 3 + hseq.get('D', 0),
            "away_pl"  : sum(aseq.values()), 
            "away_w"   : aseq.get('A', 0),
            "away_d"   : aseq.get('D', 0),
            "away_l"   : aseq.get('H', 0),
            "away_gf"  : season[season['visitor'] == team]['vgoal'].sum(),
            "away_ga"  : season[season['visitor'] == team]['hgoal'].sum(),
            "away_gd"  : (season[season['visitor'] == team]['goaldif'].sum() * -1),
            "away_pts" : aseq.get('A', 0) * 3 + hseq.get('D', 0)
        }
        team_dict["pl"]  = team_dict["home_pl"]  + team_dict['away_pl']            
        team_dict["w"]   = team_dict["home_w"]   + team_dict['away_w']            
        team_dict["d"]   = team_dict["home_d"]   + team_dict['away_d']            
        team_dict["l"]   = team_dict["home_l"]   + team_dict['away_l']
        team_dict["gf"]  = team_dict["home_gf"]  + team_dict['away_gf']
        team_dict["ga"]  = team_dict["home_ga"]  + team_dict['away_ga']
        team_dict["gd"]  = team_dict["home_gd"]  + team_dict['away_gd']
        team_dict["pts"] = team_dict["home_pts"] + team_dict['away_pts']
        table.append(team_dict)
    return table

seasons = pl['Season'].unique().tolist()
all_tables = []
for season in seasons :
    table = my_table(pl[pl['Season'] == season])
    all_tables += table

tbl = pd.DataFrame(all_tables) 

away = ['away_pl', 'away_w', 'away_d', 'away_l', 'away_gf', 'away_ga', 'away_gd', 'away_pts']
home = ['home_pl', 'home_w', 'home_d', 'home_l', 'home_gf', 'home_ga', 'home_gd', 'home_pts']
full = ['pl', 'w', 'd', 'l', 'gf', 'ga', 'gd', 'pts']
team = ['team']
tbl = tbl[['season', 'team']+home+away+full]

所以现在' tbl'很好,我可以按季节索引它。但我很难将它变成一个多指数,而这个指数已经过了一个季节。首先,然后按他们的积分(降序),相当于他们的联赛终点位置。为了清楚起见,我希望索引为1-20(或1-22),但索引由点总数驱动。

此外,如果有人对我自己如何建造桌子有任何想法,我很乐意听到它。我花了很长时间尝试使用各种矢量化函数,我告诉它们更有效但却无法使其工作并恢复为循环。

谢谢

2 个答案:

答案 0 :(得分:0)

考虑使用GroupBy.rankSeries.rank按降序排列pts来计算团队。由于我无法确定您的最终数据框是在季节,团队还是游戏级别,因此选择适当的排名:

tbl['team_rank'] = tbl.groupby(['season', 'team'])['pts'].rank(ascending=False)

tbl['team_rank'] = tbl['pts'].rank(ascending=False)

然后在多索引的字段对上使用set_index,而无需事先排序。

tbl = tbl.set_index(['season', 'team_rank'])

但是,由于您需要多个字段进行排名,因此请考虑使用reset_index然后检索index.values以获取有序编号(+ 1,如果您不想从零开始):

tbl = tbl.sort_values(['season', 'pts', 'gd', 'gf'], 
                      ascending=[True, False, False, False]).reset_index(drop=True)
tbl['rank'] = tbl.index.values + 1
tbl = tbl.set_index(['season', 'rank'])

答案 1 :(得分:0)

这就是我使用上面的代码使其工作的方式......

team_count = tbl.groupby(['season'])['team'].count().tolist()
rank_column = []

for i in team_count :
    j = list(range(1,i+1,1))
    rank_column += j

tbl = tbl.sort_values(['season', 'pts', 'gd', 'gf'], ascending=[True, False, False, False])
tbl['rank'] = rank_column
tbl = tbl.set_index(['season', 'rank'])

不确定这是否是最有效的方式,但它有效吗?