Pandas Python中Timedeltas列的总和

时间:2016-06-20 10:44:25

标签: python pandas sum timedelta

之前已经问过here类似的问题

但是当我尝试了所有可用的解决方案时,它给了我错误。

代码:

print sum(data['Activity_Duration'],datetime.timedelta())  
#import operator  
#print reduce(operator.add, data['Activity_Duration'])  

错误:

  

OverflowError

     

1 #print sum(数据[' Activity_Duration'],datetime.timedelta())
  2进口操作员
  ----> 3 print reduce(operator.add,data [' Activity_Duration'])

     

OverflowError:太大而无法转换

我错过了什么,或者我们能否提出更具伸缩性的解决方案?

信息:我的数据有436746行。

我正在使用8台机器,数据大小为650MB

1 个答案:

答案 0 :(得分:3)

我认为你需要sum

print (df['Activity_Duration'].sum())

样品:

import pandas as pd

start = pd.to_datetime('2015-02-24')
end = pd.to_datetime('2016-04-25')
rng = pd.date_range(start, end, freq='6D')

start = pd.to_datetime('2015-02-26')
end = pd.to_datetime('2016-04-27')
rng1 = pd.date_range(start, end, freq='6D')

df = pd.DataFrame({'Date1': rng, 'Date2': rng1})  

df['Activity_Duration'] = df.Date2 - df.Date1
print (df)
        Date1      Date2  Activity_Duration
0  2015-02-24 2015-02-26             2 days
1  2015-03-02 2015-03-04             2 days
2  2015-03-08 2015-03-10             2 days
3  2015-03-14 2015-03-16             2 days
4  2015-03-20 2015-03-22             2 days
5  2015-03-26 2015-03-28             2 days
6  2015-04-01 2015-04-03             2 days
7  2015-04-07 2015-04-09             2 days
8  2015-04-13 2015-04-15             2 days
9  2015-04-19 2015-04-21             2 days
...
...


print (df['Activity_Duration'].sum())
144 days 00:00:00

如果需要float输出:

import numpy as np

df['Activity_Duration'] = (df.Date2 - df.Date1) / np.timedelta64(1, 'D')
print (df)
        Date1      Date2  Activity_Duration
0  2015-02-24 2015-02-26                2.0
1  2015-03-02 2015-03-04                2.0
2  2015-03-08 2015-03-10                2.0
3  2015-03-14 2015-03-16                2.0
4  2015-03-20 2015-03-22                2.0
...
...
...

print (df['Activity_Duration'].sum())
144.0

另一个解决方案是dt.days - 输出为int

print (df['Activity_Duration'].dt.days.sum())
144

Timedelta limitations