对DatetimeIndexResampler类型执行操作时出错

时间:2019-11-19 09:53:36

标签: python-3.x pandas dataframe time-series

我有一个时间序列数据框,希望找到每个记录中的日期与该数据框中的最后一个(最大)日期之间的差异。但是出现错误-TypeError:-:'DatetimeIndex'和'SeriesGroupBy'不受支持的操作数类型。从错误看来,数据框不是“正确”类型,不允许进行这些操作。我如何避免这种情况或可能以其他某种格式转换数据以便执行该操作。下面是重现错误的示例代码

import pandas as pd

df = pd.DataFrame([[54.7,36.3,'2010-07-20'],[54.7,36.3,'2010-07-21'],[52.3,38.7,'2010-07-26'],[52.3,38.7,'2010-07-30']],
                  columns=['col1','col2','date'])
df.date = pd.to_datetime(df.date)
df.index = df.date
df = df.resample('D')
print(type(df))
diff = (df.date.max() - df.date).values

1 个答案:

答案 0 :(得分:1)

我认为您首先需要通过DataFrame.set_index创建DatetimeIndex,因此,如果通过max进行汇总,则可以获得连续的值:

df = pd.DataFrame([[54.7,36.3,'2010-07-20'],
                   [54.7,36.3,'2010-07-21'],
                   [52.3,38.7,'2010-07-26'],
                   [52.3,38.7,'2010-07-30']],
              columns=['col1','col2','date'])

df.date = pd.to_datetime(df.date)

df1 = df.set_index('date').resample('D').max()
#alternative if not duplicated datetimes
#df1 = df.set_index('date').asfreq('D')
print (df1)
            col1  col2
date                  
2010-07-20  54.7  36.3
2010-07-21  54.7  36.3
2010-07-22   NaN   NaN
2010-07-23   NaN   NaN
2010-07-24   NaN   NaN
2010-07-25   NaN   NaN
2010-07-26  52.3  38.7
2010-07-27   NaN   NaN
2010-07-28   NaN   NaN
2010-07-29   NaN   NaN
2010-07-30  52.3  38.7

然后将其自身减去索引的最大值,并用TimedeltaIndex.days将timedelta转换为天数:

df1['diff'] =  (df1.index.max() - df1.index).days
print (df1)
            col1  col2  diff
date                        
2010-07-20  54.7  36.3    10
2010-07-21  54.7  36.3     9
2010-07-22   NaN   NaN     8
2010-07-23   NaN   NaN     7
2010-07-24   NaN   NaN     6
2010-07-25   NaN   NaN     5
2010-07-26  52.3  38.7     4
2010-07-27   NaN   NaN     3
2010-07-28   NaN   NaN     2
2010-07-29   NaN   NaN     1
2010-07-30  52.3  38.7     0