Question

我是python的新手，想找出一个数据框的两列之间的区别。我想要的是找到两列以及相应的第三列之间的区别。例如，我有一个数据框“足球”，其中包含所有踢足球的球队的名单，以及针对其俱乐部的进球。我想找出目标差异以及球队名称。即（Goal Diff = goalsFor-goalsAgainst）。

 Pos             Team  Seasons Points GamesPlayed GamesWon GamesDrawn  \
0    1      Real Madrid       86   5656        2600     1647        552   
1    2        Barcelona       86   5435        2500     1581        573   
2    3  Atletico Madrid       80   5111        2614     1241        598   


GamesLost GoalsFor GoalsAgainst
0       563     5947         3140   
1       608     5900         3114     
2       775     4534         3309

我尝试创建一个函数，然后如下遍历数据帧的每一行：

for index, row in football.iterrows():
        ##pdb.set_trace()
        goalsFor=row['GoalsFor']
        goalsAgainst=row['GoalsAgainst']
        teamName=row['Team']
        if not total:
            totals=np.array(Goal_diff_count_Formal(int(goalsFor), int(goalsAgainst), teamName))
        else:
            total= total.append(Goal_diff_count_Formal(int(goalsFor), int(goalsAgainst), teamName))

    return total

def Goal_diff_count_Formal(gFor, gAgainst, team):
goalsDifference=gFor-gAgainst
return [team, goalsDifference]

但是，我想知道是否有最快的方法来获得此信息，例如

dataframe['goalsFor'] - dataframe['goalsAgainst'] #along with the team name in the dataframe

Answer 1

如果Team列中的唯一值的解决方案-按Team创建索引，求和并按索引选择Team：

df = df.set_index('Team')
s = df['GoalsFor'] - df['GoalsAgainst'] 
print (s)
Team
Real Madrid        2807
Barcelona          2786
Atletico Madrid    1225
dtype: int64

print (s['Atletico Madrid'])
1225

如果可能，在Team列中重复的值的解决方案：

我认为您需要按Team分组并先汇总sum，然后才能有所不同：

#change sample data for Team in row 3
print (df)
   Pos         Team  Seasons  Points  GamesPlayed  GamesWon  GamesDrawn  \
0    1  Real Madrid       86    5656         2600      1647         552   
1    2    Barcelona       86    5435         2500      1581         573   
2    3  Real Madrid       80    5111         2614      1241         598   

   GamesLost  GoalsFor  GoalsAgainst  
0        563      5947          3140  
1        608      5900          3114  
2        775      4534          3309  


df = df.groupby('Team')['GoalsFor','GoalsAgainst'].sum()
df['diff'] = df['GoalsFor'] - df['GoalsAgainst'] 
print (df)
             GoalsFor  GoalsAgainst  diff
Team                                     
Barcelona        5900          3114  2786
Real Madrid     10481          6449  4032

编辑：

s = df['GoalsFor'] - df['GoalsAgainst'] 
print (s)
Team
Barcelona      2786
Real Madrid    4032
dtype: int64

print (s['Barcelona'])
2786

数据框的两列之间的差异

1 个答案: