Question

我想将一行与python数据框中的下一行进行比较，然后在它们相同的情况下进行一些添加。但是，我写的代码不起作用。

我的播放数据框头在下面。

          GameCode  PlayNumber  PeriodNumber  Clock  OffenseTeamCode  \
0  299004720130829           1             1    900               47   
1  299004720130829           2             1    NaN              299   
2  299004720130829           3             1    NaN              299   
3  299004720130829           4             1    NaN              299   
4  299004720130829           5             1    NaN              299   

   DefenseTeamCode  OffensePoints  DefensePoints  Down  Distance  Spot  \
0              299              0              0   NaN       NaN    65   
1               47              0              0     1        10    75   
2               47              0              0     2         7    72   
3               47              0              0     3         1    66   
4               47              0              0     1        10    64   

  PlayType  DriveNumber  DrivePlay  
0  KICKOFF          NaN        NaN  
1     RUSH            1          1  
2     PASS            1          1  
3     RUSH            1          1  
4     RUSH            1          1

我想比较第一行的游戏代码并与第二行匹配，做一些添加它们的操作等等。但是我在以下代码中收到错误。

print play.head()
df = pd.DataFrame()

rushingyards = 0
passingyards = 0

for row in play.itertuples():
    if df.empty:
        df = play
    else:
        if play['GameCode'] == df['GameCode']:
            if play['PlayType'] in ('RUSH','PASS'):
                if play['PlayType']=='RUSH':
                    rushingyards = rushingyards+play['Distance']
                else:
                    passingyards  = passingyards + play['Distance']

请帮助。

Answer 1

也许你正在寻找groupby / sum操作：

yards = df.groupby(['GameCode', 'PlayType'])['Distance'].sum().unstack('PlayType')
# PlayType         KICKOFF  PASS  RUSH
# GameCode                            
# 299004720130829      NaN     7    21

对于每个GameCode和PlayType，这总计Distance s。 unstack会返回一个DataFrame，其索引为GameCode s，列为PlayType s。

import numpy as np
import pandas as pd
nan = np.nan

df = pd.DataFrame(
    {'Clock': [900.0, nan, nan, nan, nan],
     'DefensePoints': [0, 0, 0, 0, 0],
     'DefenseTeamCode': [299, 47, 47, 47, 47],
     'Distance': [nan, 10.0, 7.0, 1.0, 10.0],
     'Down': [nan, 1.0, 2.0, 3.0, 1.0],
     'DriveNumber': [nan, 1.0, 1.0, 1.0, 1.0],
     'DrivePlay': [nan, 1.0, 1.0, 1.0, 1.0],
     'GameCode': [299004720130829, 299004720130829, 299004720130829,
                  299004720130829, 299004720130829],
     'OffensePoints': [0, 0, 0, 0, 0],
     'OffenseTeamCode': [47, 299, 299, 299, 299],
     'PeriodNumber': [1, 1, 1, 1, 1],
     'PlayNumber': [1, 2, 3, 4, 5],
     'PlayType': ['KICKOFF', 'RUSH', 'PASS', 'RUSH', 'RUSH'],
     'Spot': [65, 75, 72, 66, 64]})

yards = df.groupby(['GameCode', 'PlayType'])['Distance'].sum().unstack('PlayType')
passing_yards, rushing_yards = yards['PASS'], yards['RUSH']

注意passing_yards和rushing_yards将是系列，索引为GameCodes。

将一行与Python数据帧中的下一行进行比较

1 个答案: