我有一个看起来像这样的数据框:
Open High Low Close Volume MA Status Portfolio
Date
1958-03-12 42.41 42.41 42.41 42.41 2688889.0 41.3016 1.0 100.0
1958-03-13 42.46 42.46 42.46 42.46 3144444.0 41.3442 1.0 NaN
1958-03-14 42.33 42.33 42.33 42.33 2388889.0 41.3734 1.0 NaN
1958-03-17 42.04 42.04 42.04 42.04 2366667.0 41.4006 1.0 NaN
1958-03-18 41.89 41.89 41.89 41.89 2300000.0 41.4184 1.0 NaN
1958-03-19 42.09 42.09 42.09 42.09 2677778.0 41.4404 1.0 NaN
1958-03-20 42.11 42.11 42.11 42.11 2533333.0 41.4676 1.0 NaN
1958-03-21 42.42 42.42 42.42 42.42 2700000.0 41.5086 1.0 NaN
1958-03-24 42.58 42.58 42.58 42.58 2866667.0 41.5504 1.0 NaN
如果“状态”等于1,我希望将“投资组合”列计算为前一天的值加上当天的收益。我有这样一行:
spx_daily.loc['1958-03-13':].loc[spx_daily['Status'] == 1, 'Portfolio'] = ((spx_daily.Close / spx_daily.Close.shift(1))) * spx_daily.Portfolio.shift(1)
但是,当我运行代码时,输出如下:
Open High Low Close Volume MA Status Portfolio
Date
1958-03-12 42.41 42.41 42.41 42.41 2688889.0 41.3016 1.0 100.000000
1958-03-13 42.46 42.46 42.46 42.46 3144444.0 41.3442 1.0 100.117897
1958-03-14 42.33 42.33 42.33 42.33 2388889.0 41.3734 1.0 NaN
1958-03-17 42.04 42.04 42.04 42.04 2366667.0 41.4006 1.0 NaN
1958-03-18 41.89 41.89 41.89 41.89 2300000.0 41.4184 1.0 NaN
1958-03-19 42.09 42.09 42.09 42.09 2677778.0 41.4404 1.0 NaN
1958-03-20 42.11 42.11 42.11 42.11 2533333.0 41.4676 1.0 NaN
1958-03-21 42.42 42.42 42.42 42.42 2700000.0 41.5086 1.0 NaN
1958-03-24 42.58 42.58 42.58 42.58 2866667.0 41.5504 1.0 NaN
仅计算第一行。那是因为该操作“一次全部发生”并且剩余的行被检测为nan吗?
在避免重复遍历的同时我该如何解决呢?
答案 0 :(得分:0)
使用Series.fillna
+ Series.cumprod
df['Portfolio']=df['Portfolio'].fillna( (df['Close']/df['Close'].shift()).mask(df.Status.ne(1),1) )
df['Portfolio']=df['Portfolio'].cumprod()
print(df)
Open High Low Close Volume MA Status Portfolio
Date
1958-03-12 42.41 42.41 42.41 42.41 2688889.0 41.3016 1.0 100.000000
1958-03-13 42.46 42.46 42.46 42.46 3144444.0 41.3442 1.0 100.117897
1958-03-14 42.33 42.33 42.33 42.33 2388889.0 41.3734 1.0 99.811365
1958-03-17 42.04 42.04 42.04 42.04 2366667.0 41.4006 1.0 99.127564
1958-03-18 41.89 41.89 41.89 41.89 2300000.0 41.4184 1.0 98.773874
1958-03-19 42.09 42.09 42.09 42.09 2677778.0 41.4404 1.0 99.245461
1958-03-20 42.11 42.11 42.11 42.11 2533333.0 41.4676 1.0 99.292620
1958-03-21 42.42 42.42 42.42 42.42 2700000.0 41.5086 1.0 100.023579
1958-03-24 42.58 42.58 42.58 42.58 2866667.0 41.5504 1.0 100.400849
我使用了 df 而不是 spx_daily 。 我只想让你理解这个主意
检查状态为== 0的一行:
df.iloc[4,6]=0
print(df)
Open High Low Close Volume MA Status Portfolio
Date
1958-03-12 42.41 42.41 42.41 42.41 2688889.0 41.3016 1.0 100.0
1958-03-13 42.46 42.46 42.46 42.46 3144444.0 41.3442 1.0 NaN
1958-03-14 42.33 42.33 42.33 42.33 2388889.0 41.3734 1.0 NaN
1958-03-17 42.04 42.04 42.04 42.04 2366667.0 41.4006 1.0 NaN
1958-03-18 41.89 41.89 41.89 41.89 2300000.0 41.4184 0.0 NaN
1958-03-19 42.09 42.09 42.09 42.09 2677778.0 41.4404 1.0 NaN
1958-03-20 42.11 42.11 42.11 42.11 2533333.0 41.4676 1.0 NaN
1958-03-21 42.42 42.42 42.42 42.42 2700000.0 41.5086 1.0 NaN
1958-03-24 42.58 42.58 42.58 42.58 2866667.0 41.5504 1.0 NaN
df['Portfolio']=df['Portfolio'].fillna( (df['Close']/df['Close'].shift()).mask(df.Status.ne(1),1) )
df['Portfolio']=df['Portfolio'].cumprod()
print(df)
Open High Low Close Volume MA Status Portfolio
Date
1958-03-12 42.41 42.41 42.41 42.41 2688889.0 41.3016 1.0 100.000000
1958-03-13 42.46 42.46 42.46 42.46 3144444.0 41.3442 1.0 100.117897
1958-03-14 42.33 42.33 42.33 42.33 2388889.0 41.3734 1.0 99.811365
1958-03-17 42.04 42.04 42.04 42.04 2366667.0 41.4006 1.0 99.127564
1958-03-18 41.89 41.89 41.89 41.89 2300000.0 41.4184 0.0 99.127564
1958-03-19 42.09 42.09 42.09 42.09 2677778.0 41.4404 1.0 99.600840
1958-03-20 42.11 42.11 42.11 42.11 2533333.0 41.4676 1.0 99.648167
1958-03-21 42.42 42.42 42.42 42.42 2700000.0 41.5086 1.0 100.381744
1958-03-24 42.58 42.58 42.58 42.58 2866667.0 41.5504 1.0 100.760365