将值转移到第二天

时间:2020-01-25 20:45:20

标签: pandas

我有这个数据框:

ID      Date  X  123_Var  456_Var  789_Var
 A  16-07-19  3      777      250      810
 A  17-07-19  9      637      121      529
 A  20-07-19  2      295      272      490
 A  21-07-19  3      778      600      544
 A  22-07-19  6      741      792      907
 B  01-07-19  4      509      690      406
 B  03-07-19  2      413      725      414
 B  04-07-19  2      170      702      912
 B  09-08-19  3      851      616      477
 B  10-08-19  9      475      447      555
 B  11-08-19  1      412      403      708
 B  12-08-19  2      299      537      321
 B  13-08-19  4      310      119      125
 C  14-08-19  4      912      755      657
 C  15-08-19  4      586      771      394
 C  17-08-19  2      500      528      764
 C  18-08-19  1      982      383      654
 C  20-08-19  3      336      691      496
 C  21-08-19  3      206      433      263
 C  22-08-19  2      373      319      111
 D  10-12-18  2      170      702      912
 E  10-12-18  2      912      755      657
 E  14-12-18  2      373      319      111

我想在每一列(123_Var 456_Var 789_Var列中)中移动值。

仅当相差一天时,该值才会移动;否则,将保留NaN值。

应分别对每个ID进行移位。 (通过Groupby。)

预期结果:

ID      Date  X  123_Var  456_Var  789_Var  123_Var_S  456_Var_S  789_Var_S
 A  16-07-19  3      777      250      810        NaN        NaN        NaN
 A  17-07-19  9      637      121      529      777.0      250.0      810.0
 A  20-07-19  2      295      272      490        NaN        NaN        NaN
 A  21-07-19  3      778      600      544      295.0      272.0      490.0
 A  22-07-19  6      741      792      907      778.0      600.0      544.0
 B  01-07-19  4      509      690      406        NaN        NaN        NaN
 B  03-07-19  2      413      725      414        NaN        NaN        NaN
 B  04-07-19  2      170      702      912      413.0      725.0      414.0
 B  09-08-19  3      851      616      477        NaN        NaN        NaN
 B  10-08-19  9      475      447      555      851.0      616.0      477.0
 B  11-08-19  1      412      403      708      475.0      447.0      555.0
 B  12-08-19  2      299      537      321      412.0      403.0      708.0
 B  13-08-19  4      310      119      125      299.0      537.0      321.0
 C  14-08-19  4      912      755      657        NaN        NaN        NaN
 C  15-08-19  4      586      771      394      912.0      755.0      657.0
 C  17-08-19  2      500      528      764        NaN        NaN        NaN
 C  18-08-19  1      982      383      654      500.0      528.0      764.0
 C  20-08-19  3      336      691      496        NaN        NaN        NaN
 C  21-08-19  3      206      433      263      336.0      691.0      496.0
 C  22-08-19  2      373      319      111      206.0      433.0      263.0
 D  10-12-18  2      170      702      912        NaN        NaN        NaN
 E  10-12-18  2      912      755      657        NaN        NaN        NaN
 E  14-12-18  2      373      319      111        NaN        NaN        NaN

2 个答案:

答案 0 :(得分:1)

IIUC,

我们可以分组,应用过滤器,并使用<form... action=... >.loc来分配您的值:

shift

df['Date'] = df['Date'].apply(pd.to_datetime,format='%d-%m-%y')

s = df.groupby('ID')['Date'].apply(lambda x : (x - x.shift()).eq('1 days'))

cols = df.filter(like='Var').columns.map(lambda x : x + '_S')

df[cols]  = df.filter(like='Var').shift()

df.loc[~s,cols]= np.nan

答案 1 :(得分:0)

您可能希望通过iterrows()考虑这种方法:

for index, row in df.iterrows():
    if df.loc[index, 'Date'] == df.loc[index-1, 'Date'] + pd.Timedelta(days=1):
        df.loc[index, '123_Var_S'] = df.loc[index-1, '123_Var']
        df.loc[index, '456_Var_S'] = df.loc[index-1, '456_Var']
        df.loc[index, '789_Var_S'] = df.loc[index-1, '789_Var']