我是熊猫新手。
我有一个看起来像这样的数据框(只是更大了):
Horses RaceDate Position
1 RedHorse 1/2/00 2
2 BlueHorse 1/2/00 6
3 YellowHorse 1/2/00 7
4 RedHorse 15/1/00 3
我想为以前的结果添加一列。这样我的数据框可能看起来像这样:
Horses RaceDate Position PrevPosition
1 RedHorse 1/2/00 2 3
2 BlueHorse 1/2/00 6 -
3 YellowHorse 1/2/00 7 -
4 RedHorse 15/1/00 3 -
我尝试了以下方法:
def prevRuns(horseName, raceDate):
horseDf = df.loc[df['Horse'] == horseName]
currentRace = horseDf.index[horseDf['RaceDate'] == raceDate]
if len(horseDf.index) >= currentRace:
return horseDf.at[currentRace+1,'Position']
else:
return 0
df['prevRun'] = df['Horse'].apply(prevRuns, raceDate = df['RaceDate'])
但是它不起作用。
ValueError: Can only compare identically-labeled Series objects
为什么不起作用?
是否有更优雅的方法来实现我要完成的任务?
答案 0 :(得分:2)
您可以使用groupby
+ shift
:
# convert dates to datetime and sort descending
df['RaceDate'] = pd.to_datetime(df['RaceDate'], dayfirst=True)
df = df.sort_values('RaceDate', ascending=False)
# groupby and shift for previous position
df['PrevPosition'] = df.groupby('Horses')['Position'].shift(-1)
print(df)
Horses RaceDate Position PrevPosition
1 RedHorse 2000-02-01 2 3.0
2 BlueHorse 2000-02-01 6 NaN
3 YellowHorse 2000-02-01 7 NaN
4 RedHorse 2000-01-15 3 NaN