在Python Pandas中汇总TimeDeltas

时间:2016-04-21 10:15:02

标签: python pandas timedelta

虽然尝试在大熊猫的时间跨度中进行求和,但它似乎适用于切片而不是整个列。

>> d.ix[0:100, 'VOID-DAYS'].sum()
Timedelta('2113 days 00:00:00')

>> d['VOID-DAYS'].sum()


ValueError: overflow in timedelta operation

1 个答案:

答案 0 :(得分:5)

如果VOID-DAYS表示整数天,则将Timedeltas转换为整数:

df['VOID-DAYS'] = df['VOID-DAYS'].dt.days
import numpy as np
import pandas as pd
df = pd.DataFrame({'VOID-DAYS': pd.to_timedelta(np.ones((106752,)), unit='D')})
try:
    print(df['VOID-DAYS'].sum())
except ValueError as err:
    print(err)
    # overflow in timedelta operation


df['VOID-DAYS'] = df['VOID-DAYS'].dt.days
print(df['VOID-DAYS'].sum())
# 106752

如果Timedeltas包含秒或更小的单位,则使用

df['VOID-DAYS'] = df['VOID-DAYS'].dt.total_seconds()

将值转换为float。

Pandas Timedeltas(Series和TimedeltaIndexes)将所有timedeltas存储为与NumPy的timedelta64[ns] dtype兼容的整数。这个dtype使用8字节的整数来存储timedelta,以纳秒为单位。

此格式中可表示的最大天数是

In [73]: int(float(np.iinfo(np.int64).max) / (10**9 * 3600 * 24))
Out[73]: 106751

这就是为什么

In [74]: pd.Series(pd.to_timedelta(np.ones((106752,)), unit='D')).sum()
ValueError: overflow in timedelta operation

引发ValueError,但

In [75]: pd.Series(pd.to_timedelta(np.ones((106751,)), unit='D')).sum()
Out[75]: Timedelta('106751 days 00:00:00')

没有。