Question

我在建模时间序列中经历了很多。有时您可能会以不同的频率报告数据，例如每天一次和每周一次。我想要的是不要提前填写一周中每一天的每周数据点（因为它通常是本周所有值的总和），但是向前填充或用它的平均值替换数据。从本质上讲，我想展开数据。

所以，如果我有

s = pd.Series(index=pd.date_range('2015/1/1', '2015/1/9'), 
             data=[2, np.nan, 6, np.nan, np.nan, 2, np.nan, np.nan, np.nan])

然后我想回来

2015-01-01     1
2015-01-02     1
2015-01-03     2
2015-01-04     2
2015-01-05     2
2015-01-06   0.5
2015-01-07   0.5
2015-01-08   0.5
2015-01-09   0.5
Freq: D, dtype: float64

有关简单方法的任何想法吗？ for-loop是不可避免的？

Answer 1

以下是使用.cumcount将系列分成不同组然后transform的一种方式。

s.fillna(method='ffill').groupby(s.notnull().cumsum()).transform(lambda g: g/len(g))

2015-01-01    1.0
2015-01-02    1.0
2015-01-03    2.0
2015-01-04    2.0
2015-01-05    2.0
2015-01-06    0.5
2015-01-07    0.5
2015-01-08    0.5
2015-01-09    0.5
Freq: D, dtype: float64

前向填充pandas列不具有最后一个值，但具有非null和null元素的均值

1 个答案: