我有这个数据框:
ID Date X 123_Var 456_Var 789_Var
A 16-07-19 3 777 250 810
A 17-07-19 9 637 121 529
A 20-07-19 2 295 272 490
A 21-07-19 3 778 600 544
A 22-07-19 6 741 792 907
B 01-07-19 4 509 690 406
B 03-07-19 2 413 725 414
B 04-07-19 2 170 702 912
B 09-08-19 3 851 616 477
B 10-08-19 9 475 447 555
B 11-08-19 1 412 403 708
B 12-08-19 2 299 537 321
B 13-08-19 4 310 119 125
C 14-08-19 4 912 755 657
C 15-08-19 4 586 771 394
C 17-08-19 2 500 528 764
C 18-08-19 1 982 383 654
C 20-08-19 3 336 691 496
C 21-08-19 3 206 433 263
C 22-08-19 2 373 319 111
D 10-12-18 2 170 702 912
E 10-12-18 2 912 755 657
E 14-12-18 2 373 319 111
我想在每一列(123_Var 456_Var 789_Var
列中)中移动值。
仅当相差一天时,该值才会移动;否则,将保留NaN
值。
应分别对每个ID进行移位。 (通过Groupby
。)
预期结果:
ID Date X 123_Var 456_Var 789_Var 123_Var_S 456_Var_S 789_Var_S
A 16-07-19 3 777 250 810 NaN NaN NaN
A 17-07-19 9 637 121 529 777.0 250.0 810.0
A 20-07-19 2 295 272 490 NaN NaN NaN
A 21-07-19 3 778 600 544 295.0 272.0 490.0
A 22-07-19 6 741 792 907 778.0 600.0 544.0
B 01-07-19 4 509 690 406 NaN NaN NaN
B 03-07-19 2 413 725 414 NaN NaN NaN
B 04-07-19 2 170 702 912 413.0 725.0 414.0
B 09-08-19 3 851 616 477 NaN NaN NaN
B 10-08-19 9 475 447 555 851.0 616.0 477.0
B 11-08-19 1 412 403 708 475.0 447.0 555.0
B 12-08-19 2 299 537 321 412.0 403.0 708.0
B 13-08-19 4 310 119 125 299.0 537.0 321.0
C 14-08-19 4 912 755 657 NaN NaN NaN
C 15-08-19 4 586 771 394 912.0 755.0 657.0
C 17-08-19 2 500 528 764 NaN NaN NaN
C 18-08-19 1 982 383 654 500.0 528.0 764.0
C 20-08-19 3 336 691 496 NaN NaN NaN
C 21-08-19 3 206 433 263 336.0 691.0 496.0
C 22-08-19 2 373 319 111 206.0 433.0 263.0
D 10-12-18 2 170 702 912 NaN NaN NaN
E 10-12-18 2 912 755 657 NaN NaN NaN
E 14-12-18 2 373 319 111 NaN NaN NaN
答案 0 :(得分:1)
IIUC,
我们可以分组,应用过滤器,并使用<form... action=... >
和.loc
来分配您的值:
shift
df['Date'] = df['Date'].apply(pd.to_datetime,format='%d-%m-%y')
s = df.groupby('ID')['Date'].apply(lambda x : (x - x.shift()).eq('1 days'))
cols = df.filter(like='Var').columns.map(lambda x : x + '_S')
df[cols] = df.filter(like='Var').shift()
df.loc[~s,cols]= np.nan
答案 1 :(得分:0)
您可能希望通过iterrows()考虑这种方法:
for index, row in df.iterrows():
if df.loc[index, 'Date'] == df.loc[index-1, 'Date'] + pd.Timedelta(days=1):
df.loc[index, '123_Var_S'] = df.loc[index-1, '123_Var']
df.loc[index, '456_Var_S'] = df.loc[index-1, '456_Var']
df.loc[index, '789_Var_S'] = df.loc[index-1, '789_Var']