作为能够在两个日期时间之间计算活动的后续问题,这里非常好:Create a Pandas dataframe with counts of items spanning a date range
剩下的问题是最后一个月,['END_DATE']在两个表被求和相减后最终显示为零,这在数学上是正确的,因为所有项目都有当前月份或更早的结束日期,但是这种情况因为它们在那个月至少在某个部分处于活动状态,所以将一个月添加到END_DATE会更加正确,因此它们将在结束月份显示为活动状态(H2是数据帧)
代码是:
ends = H2['END_DATE'].apply(lambda t: t.to_period(freq='m')).value_counts()
我尝试使用rollforward和DateOffset(month = 1),例如。对于DateOffset:
ends = (H2['END_DATE'].DateOffset(months=1)).apply(lambda t: t.to_period(freq='m')).value_counts()
这给了我这个错误:
AttributeError: 'Series' object has no attribute 'DateOffset'
答案 0 :(得分:4)
最简单的方法是在PeriodIndex中添加一个(月):
In [21]: ends
Out[21]:
2000-05 1
2000-09 1
2001-06 1
Freq: M, dtype: int64
In [22]: ends.index = ends.index + 1
In [23]: ends
Out[23]:
2000-06 1
2000-10 1
2001-07 1
Freq: M, dtype: int64
我最初的建议是在重新编制索引后进行转换(因为无论如何你都要这样做):
In [11]: ends
Out[11]:
2000-05 1
2000-09 1
2001-06 1
Freq: M, dtype: int64
In [12]: p = pd.PeriodIndex(freq='m', start='2000-1', periods=19) # Note: needs to be one more than before
In [13]: sparse_ends = ends.reindex(p)
In [14]: sparse_ends.shift(1)
Out[14]:
2000-01 NaN
2000-02 NaN
2000-03 NaN
2000-04 NaN
2000-05 NaN
2000-06 1
2000-07 NaN
2000-08 NaN
2000-09 NaN
2000-10 1
2000-11 NaN
2000-12 NaN
2001-01 NaN
2001-02 NaN
2001-03 NaN
2001-04 NaN
2001-05 NaN
2001-06 NaN
2001-07 1
Freq: M, dtype: float64