下面是1.起始df(称为" close"),以及2.一行代码及其生成的df:
1
Date
2006-01-27 100.0
2006-01-30 100.0
2006-01-31 100.0
2006-02-01 100.0
2006-02-02 NaN
2006-02-03 NaN
2
close.apply(lambda x: x.shift(1) + (x.shift(4))
Date
2006-01-27 NaN
2006-01-30 NaN
2006-01-31 NaN
2006-02-01 NaN
2006-02-02 100.706786
2006-02-03 NaN
我的预期输出是使用#2(100.706786)的输出,现有的df"关闭"计算序列中的下一个值,即2/03。该日期需要最后一个值(移位1),然后需要4个值(移位4或100)。
如何仅使用矢量化来完成此操作?我想避免循环因为它超级慢。这是我所拥有的那个:
closedf = pd.DataFrame()
for num,date in enumerate(close.index[4:]):
widget = close.apply(lambda x: x.shift(1) + (x.shift(4)).iloc[num+4]
closedf[date] = close.iloc[num+4] = widget
答案 0 :(得分:4)
考虑一系列close
close = pd.Series(
[100] * 3 + [100.706786] + [np.nan] * 10,
pd.date_range('2006-01-27', periods=14, name='Date')
)
close
Date
2006-01-27 100.000000
2006-01-28 100.000000
2006-01-29 100.000000
2006-01-30 100.706786
2006-01-31 NaN
2006-02-01 NaN
2006-02-02 NaN
2006-02-03 NaN
2006-02-04 NaN
2006-02-05 NaN
2006-02-06 NaN
2006-02-07 NaN
2006-02-08 NaN
2006-02-09 NaN
Freq: D, dtype: float64
<强>解决方案强>
这是斐波那契序列的衍生物。据我所知,我们无法矢量化&#34; ......(w / e&#34; vectorize&#34;表示)
但我们可以创建一个执行任务的生成器
def shib(x1, x2, x3, x4):
while True:
x1, x2, x3, x4 = x2, x3, x4, x1 + x4
yield x4
然后用它来分配新的变量
from itertools import islice
close.iloc[4:] = list(islice(shib(*close[:4]), 0, len(close) - 4))
close
Date
2006-01-27 100.000000
2006-01-28 100.000000
2006-01-29 100.000000
2006-01-30 100.706786
2006-01-31 200.706786
2006-02-01 300.706786
2006-02-02 400.706786
2006-02-03 501.413572
2006-02-04 702.120358
2006-02-05 1002.827144
2006-02-06 1403.533930
2006-02-07 1904.947502
2006-02-08 2607.067860
2006-02-09 3609.895004
Freq: D, dtype: float64
答案 1 :(得分:0)
我实际上找到了一个非常方便的解决方案(并且非常快)使用deque:
from collections import deque
queue = deque([100]*(4))
close = []
for num in range(0,len(close.index-4):
nextval = queue[-1] + queue[0]
close.append(nextval)
queue.popleft()
queue.append(nextval)
close = pd.DataFrame(close,index=close.index)