数据框的两列之间的差异

时间:2019-10-14 05:56:49

标签: python-3.x pandas numpy dataframe

我是python的新手,想找出一个数据框的两列之间的区别。 我想要的是找到两列以及相应的第三列之间的区别。例如,我有一个数据框“足球”,其中包含所有踢足球的球队的名单,以及针对其俱乐部的进球。我想找出目标差异以及球队名称。即(Goal Diff = goalsFor-goalsAgainst)。

 Pos             Team  Seasons Points GamesPlayed GamesWon GamesDrawn  \
0    1      Real Madrid       86   5656        2600     1647        552   
1    2        Barcelona       86   5435        2500     1581        573   
2    3  Atletico Madrid       80   5111        2614     1241        598   


GamesLost GoalsFor GoalsAgainst
0       563     5947         3140   
1       608     5900         3114     
2       775     4534         3309    

我尝试创建一个函数,然后如下遍历数据帧的每一行:

for index, row in football.iterrows():
        ##pdb.set_trace()
        goalsFor=row['GoalsFor']
        goalsAgainst=row['GoalsAgainst']
        teamName=row['Team']
        if not total:
            totals=np.array(Goal_diff_count_Formal(int(goalsFor), int(goalsAgainst), teamName))
        else:
            total= total.append(Goal_diff_count_Formal(int(goalsFor), int(goalsAgainst), teamName))

    return total

def Goal_diff_count_Formal(gFor, gAgainst, team):
goalsDifference=gFor-gAgainst
return [team, goalsDifference]

但是,我想知道是否有最快的方法来获得此信息,例如

dataframe['goalsFor'] - dataframe['goalsAgainst'] #along with the team name in the dataframe

1 个答案:

答案 0 :(得分:1)

如果Team列中的唯一值的解决方案-按Team创建索引,求和并按索引选择Team

df = df.set_index('Team')
s = df['GoalsFor'] - df['GoalsAgainst'] 
print (s)
Team
Real Madrid        2807
Barcelona          2786
Atletico Madrid    1225
dtype: int64

print (s['Atletico Madrid'])
1225

如果可能,在Team列中重复的值的解决方案:

我认为您需要按Team分组并先汇总sum,然后才能有所不同:

#change sample data for Team in row 3
print (df)
   Pos         Team  Seasons  Points  GamesPlayed  GamesWon  GamesDrawn  \
0    1  Real Madrid       86    5656         2600      1647         552   
1    2    Barcelona       86    5435         2500      1581         573   
2    3  Real Madrid       80    5111         2614      1241         598   

   GamesLost  GoalsFor  GoalsAgainst  
0        563      5947          3140  
1        608      5900          3114  
2        775      4534          3309  


df = df.groupby('Team')['GoalsFor','GoalsAgainst'].sum()
df['diff'] = df['GoalsFor'] - df['GoalsAgainst'] 
print (df)
             GoalsFor  GoalsAgainst  diff
Team                                     
Barcelona        5900          3114  2786
Real Madrid     10481          6449  4032

编辑:

s = df['GoalsFor'] - df['GoalsAgainst'] 
print (s)
Team
Barcelona      2786
Real Madrid    4032
dtype: int64

print (s['Barcelona'])
2786