填写缺失累计增长的上一个有效值

时间:2016-12-09 07:20:04

标签: python pandas numpy

考虑两个pd.Series的职位并返回

np.random.seed([3,1415])
s = pd.Series([2, np.nan, np.nan, 2, np.nan, np.nan, np.nan, 1, np.nan], name='position')
r = pd.Series(np.random.lognormal(mean=.2, sigma=0.1, size=len(s)), name='returns')
pd.concat([r, s], axis=1)

    returns  position
0  0.987111       2.0
1  1.075896       NaN  # fill this in with 2 * 0.987111
2  1.002954       NaN  # fill this in with 2 * 0.987111 * 1.075896
3  0.974427       2.0
4  1.179477       NaN  # fill this in with 2 * 0.974427
5  1.218115       NaN  # fill this in with 2 * 0.974427 * 1.179477 
6  1.260645       NaN  # fill this in with 2 * 0.974427 * 1.179477 * 1.218115
7  1.264755       1.0
8  1.311979       NaN  # fill this in with 1 * 1.264755

预期输出

0    2.000000
1    1.974223
2    2.124057
3    2.000000
4    1.948854
5    2.298629
6    2.799995
7    1.000000
8    1.264755
dtype: float64

2 个答案:

答案 0 :(得分:2)

您可以使用Series groupby cumprod shiftreturns来使用Apple bug reporting

print (df.position.fillna(0).cumsum())
0    2.0
1    2.0
2    2.0
3    4.0
4    4.0
5    4.0
6    4.0
7    5.0
8    5.0
Name: position, dtype: float64

print (df.groupby(df.position.fillna(0).cumsum())
         .apply(lambda x: x.returns.shift().fillna(x.position).cumprod())
         .reset_index(drop=True))

0    2.000000
1    1.974223
2    2.124057
3    2.000000
4    1.948854
5    2.298629
6    2.799995
7    1.000000
8    1.264755
Name: returns, dtype: float64

答案 1 :(得分:1)

这是一个基于NumPy的解决方案 -

In [360]: r
Out[360]: 
0    0.987111
1    1.075896
2    1.002954
3    0.974427
4    1.179477
5    1.218115
6    1.260645
7    1.264755
8    1.311979
Name: returns, dtype: float64

In [361]: s
Out[361]: 
0    2.0
1    NaN
2    NaN
3    2.0
4    NaN
5    NaN
6    NaN
7    1.0
8    NaN
Name: position, dtype: float64

示例运行:

1)输入 -

In [362]: pd.Series(fillNaNs_numpy(r.values, s.values))
Out[362]: 
0    2.000000
1    1.974223
2    2.124057
3    2.000000
4    1.948854
5    2.298629
6    2.799995
7    1.000000
8    1.264755
dtype: float64

2)输出 -

bm.cumsum()-1

可能的改进:

1)让我们在最后一步说cidxidx = np.append(np.nonzero(bm)[0], bm.size) cidx = np.repeat(np.arange(idx.size-1), idx[1:] - idx[:-1]) ,另一种方法就是这样 -

{{1}}