基于另一个大熊猫的差异,一列中的差异

时间:2018-05-22 08:56:50

标签: python pandas series pandas-groupby timedelta

如何使用pandas执行以下操作?

我有这个数据框:

weight   |  Date                  |  dateDay
43       | 09/03/2018  08:48:48   |  09/03/2018
30       | 10/03/2018  23:28:48   |  10/03/2018
45       | 12/03/2018  04:21:44   |  12/03/2018
25       | 17/03/2018  00:23:32   |  17/03/2018
35       | 18/03/2018  04:49:01   |  18/03/2018
39       | 19/03/2018  20:14:37   |  19/03/2018

我想要这个:

weight   |  Date                  |  dateDay     |  Fun_Cum
43       | 09/03/2018  08:48:48   |  09/03/2018  |    NULL
30       | 10/03/2018  23:28:48   |  10/03/2018  |    -13
45       | 12/03/2018  04:21:44   |  12/03/2018  |    NULL
25       | 17/03/2018  00:23:32   |  17/03/2018  |    NULL
35       | 18/03/2018  04:49:01   |  18/03/2018  |     10
39       | 19/03/2018  20:14:37   |  19/03/2018  |      4

伪代码:

如果Day不遵循Day-1 => Fun_Cum为NULL;

其他(体重日) - (体重日-1)

谢谢

2 个答案:

答案 0 :(得分:1)

这是使用pd.Series.diffpd.Series.shift的一种方式。您可以区分连续的datetime元素和访问pd.Series.dt.days属性。

df['Fun_Cum'] = df['weight'].diff()

df.loc[(df.dateDay - df.dateDay.shift()).dt.days != 1, 'Fun_Cum'] = np.nan

print(df)

   weight       Date    dateDay  Fun_Cum
0      43 2018-03-09 2018-03-09      NaN
1      30 2018-03-10 2018-03-10    -13.0
2      45 2018-03-12 2018-03-12      NaN
3      25 2018-03-17 2018-03-17      NaN
4      35 2018-03-18 2018-03-18     10.0
5      39 2018-03-19 2018-03-19      4.0

答案 1 :(得分:1)

#import pandas as pd
#from datetime import datetime
#to_datetime = lambda d: datetime.strptime(d, '%d/%m/%Y')
#df = pd.read_csv('d.csv', converters={'dateDay': to_datetime})

以上部分只有你从文件中读取,否则它只是.shift()你需要什么

a = df
b = df.shift()
df["Fun_Cum"] = (a.weight - b.weight) * ((a.dateDay - b.dateDay).dt.days ==1)