在pandas数据帧中的组内添加timedelta数据

时间:2016-08-06 15:13:54

标签: python pandas dataframe group-by timedelta

我正在处理pandas中的数据框,其中有四列user_idtime_stamp1time_stamp2interval。 Time_stamp1和time_stamp2的类型为datetime64 [ns],interval的类型为timedelta64 [ns]。

我想总结数据框中每个user_id的间隔值,并尝试以多种方式计算它:

1)df["duration"]=   df.groupby('user_id')['interval'].apply (lambda x: x.sum())

2)df ["duration"]=  df.groupby('user_id').aggregate (np.sum)

3)df ["duration"]=  df.groupby('user_id').agg (np.sum)

但在运行代码后,它们都不起作用,duration的值将为NaT

1 个答案:

答案 0 :(得分:0)

更新:您可以使用transform()方法:

In [291]: df['duration'] = df.groupby('user_id')['interval'].transform('sum')

In [292]: df
Out[292]:
                    a  user_id                   b          interval           duration
0 2016-01-01 00:00:00     0.01 2015-11-11 00:00:00  51 days 00:00:00  838 days 08:00:00
1 2016-03-10 10:39:00     0.01 2015-12-08 18:39:00               NaT  838 days 08:00:00
2 2016-05-18 21:18:00     0.01 2016-01-05 13:18:00 134 days 08:00:00  838 days 08:00:00
3 2016-07-27 07:57:00     0.01 2016-02-02 07:57:00 176 days 00:00:00  838 days 08:00:00
4 2016-10-04 18:36:00     0.01 2016-03-01 02:36:00 217 days 16:00:00  838 days 08:00:00
5 2016-12-13 05:15:00     0.01 2016-03-28 21:15:00 259 days 08:00:00  838 days 08:00:00
6 2017-02-20 15:54:00     0.02 2016-04-25 15:54:00 301 days 00:00:00 1454 days 00:00:00
7 2017-05-01 02:33:00     0.02 2016-05-23 10:33:00 342 days 16:00:00 1454 days 00:00:00
8 2017-07-09 13:12:00     0.02 2016-06-20 05:12:00 384 days 08:00:00 1454 days 00:00:00
9 2017-09-16 23:51:00     0.02 2016-07-17 23:51:00 426 days 00:00:00 1454 days 00:00:00

OLD回答:

演示:

In [260]: df
Out[260]:
                    a                   b          interval  user_id
0 2016-01-01 00:00:00 2015-11-11 00:00:00  51 days 00:00:00        1
1 2016-03-10 10:39:00 2015-12-08 18:39:00               NaT        1
2 2016-05-18 21:18:00 2016-01-05 13:18:00 134 days 08:00:00        1
3 2016-07-27 07:57:00 2016-02-02 07:57:00 176 days 00:00:00        1
4 2016-10-04 18:36:00 2016-03-01 02:36:00 217 days 16:00:00        1
5 2016-12-13 05:15:00 2016-03-28 21:15:00 259 days 08:00:00        1
6 2017-02-20 15:54:00 2016-04-25 15:54:00 301 days 00:00:00        2
7 2017-05-01 02:33:00 2016-05-23 10:33:00 342 days 16:00:00        2
8 2017-07-09 13:12:00 2016-06-20 05:12:00 384 days 08:00:00        2
9 2017-09-16 23:51:00 2016-07-17 23:51:00 426 days 00:00:00        2

In [261]: df.dtypes
Out[261]:
a            datetime64[ns]
b            datetime64[ns]
interval    timedelta64[ns]
user_id               int64
dtype: object

In [262]: df.groupby('user_id')['interval'].sum()
Out[262]:
user_id
1    838 days 08:00:00
2   1454 days 00:00:00
Name: interval, dtype: timedelta64[ns]

In [263]: df.groupby('user_id')['interval'].apply(lambda x: x.sum())
Out[263]:
user_id
1    838 days 08:00:00
2   1454 days 00:00:00
Name: interval, dtype: timedelta64[ns]

In [264]: df.groupby('user_id').agg(np.sum)
Out[264]:
                  interval
user_id
1        838 days 08:00:00
2       1454 days 00:00:00

请检查您的数据......