在熊猫中执行以下循环(假设df
是DataFrame
)是否有更好的方法(在性能方面)?
for i in range(len(df)):
if df['signal'].iloc[i] == 0: # if the signal is negative
if df['position'].iloc[i - 1] - 0.02 < -1: # if the row above - 0.1 < -1 set the value of current row to -1
df['position'].iloc[i] = -1
else: # if the new col value above -0.1 is > -1 then subtract 0.1 from that value
df['position'].iloc[i] = df['position'].iloc[i - 1] - 0.02
elif df['signal'].iloc[i] == 1: # if the signal is positive
if df['position'].iloc[i - 1] + 0.02 > 1: # if the value above + 0.1 > 1 set the current row to 1
df['position'].iloc[i] = 1
else: # if the row above + 0.1 < 1 then add 0.1 to the value of the current row
df['position'].iloc[i] = df['position'].iloc[i - 1] + 0.02
对于任何建议,我将不胜感激,因为我刚开始走熊猫路,很显然可能会错过一些关键的事情。
源CSV数据:
Date,sp500,sp500 MA,UNRATE,UNRATE MA,signal,position
2000-01-01,,,4.0,4.191666666666665,1,0
2000-01-02,,,4.0,4.191666666666665,1,0
2000-01-03,102.93,95.02135,4.0,4.191666666666665,1,0
2000-01-04,98.91,95.0599,4.0,4.191666666666665,1,0
2000-01-05,99.08,95.11245000000001,4.0,4.191666666666665,1,0
2000-01-06,97.49,95.15450000000001,4.0,4.191666666666665,1,0
2000-01-07,103.15,95.21575000000001,4.0,4.191666666666665,1,0
2000-01-08,103.15,95.21575000000001,4.0,4.191666666666665,1,0
2000-01-09,103.15,95.21575000000001,4.0,4.191666666666665,1,0
所需的输出:
Date,sp500,sp500 MA,UNRATE,UNRATE MA,signal,position
2000-01-01,,,4.0,4.191666666666665,1,0.02
2000-01-02,,,4.0,4.191666666666665,1,0.04
2000-01-03,102.93,95.02135,4.0,4.191666666666665,1,0.06
2000-01-04,98.91,95.0599,4.0,4.191666666666665,1,0.08
2000-01-05,99.08,95.11245000000001,4.0,4.191666666666665,1,0.1
2000-01-06,97.49,95.15450000000001,4.0,4.191666666666665,1,0.12
2000-01-07,103.15,95.21575000000001,4.0,4.191666666666665,1,0.14
2000-01-08,103.15,95.21575000000001,4.0,4.191666666666665,1,0.16
2000-01-09,103.15,95.21575000000001,4.0,4.191666666666665,1,0.18
更新下面的所有答案(在撰写本文时)产生的position
常数0.02与我的朴素循环方法不同。
换句话说,我正在寻找一种解决方案,可以为0.02
列提供0.04
,0.06
,0.08
,position
等。
答案 0 :(得分:2)
不要使用循环。熊猫专门从事向量化运算,例如为signal == 0
:
pos_shift = df['position'].shift() - 0.02
m1 = df['signal'] == 0
m2 = pos_shift < -1
df.loc[m1 & m2, 'position'] = -1
df['position'] = np.where(m1 & ~m2, pos_shift, df['position'])
您可以为signal == 1
编写类似的内容。
答案 1 :(得分:1)
感谢您添加数据和示例输出。首先,我很确定您不能对它进行矢量化处理,因为每个计算都取决于上一个的输出。所以这是我所能做到的最好的。
您的方法大约在我的计算机上0.116999
秒内
这个大约在0.0039999
秒后出现
未向量化,但速度得到了很好的提高,因为为此使用列表并将其添加回末尾的数据帧更快。
def myfunc(pos_pre, signal):
if signal == 0: # if the signal is negative
# if the new col value above -0.2 is > -1 then subtract 0.2 from that value
pos = pos_pre - 0.02
if pos < -1: # if the row above - 0.2 < -1 set the value of current row to -1
pos = -1
elif signal == 1:
# if the row above + 0.2 < 1 then add 0.2 to the value of the current row
pos = pos_pre + 0.02
if pos > 1: # if the value above + 0.1 > 1 set the current row to 1
pos = 1
return pos
''' set first position value because you aren't technically calculating it correctly in your method since there is no
position minus 1... IE: it will always be 0.02'''
new_pos = [0.02]
# skip index zero since there is no position 0 minus 1
for i in range(1, len(df)):
new_pos.append(myfunc(pos_pre=new_pos[i-1], signal=df['signal'].iloc[i]))
df['position'] = new_pos
输出:
df.position
0 0.02
1 0.04
2 0.06
3 0.08
4 0.10
5 0.12
6 0.14
7 0.16
8 0.18
答案 2 :(得分:0)
是的。寻找性能时,应始终对基础的numpy数组进行操作:
signal = df['signal'].values
position = df['position'].values
for i in range(len(df)):
if signal[i] == 0:
if position[i-1]-0.02 < -1:
position[i] = -1
else:
position[i] = position[i-1]-0.02
elif signal[i] == 1:
if position[i-1]+0.02 > 1:
position[i] = 1
else:
position[i] = position[i-1]+0.02
您会对性能提高感到惊讶,通常是10倍甚至更多倍。
答案 3 :(得分:0)
最可能有更好的方法,但是这种方法也应该起作用:
df['previous'] = df.signal.shift()
def get_signal_value(row):
if row.signal == 0:
compare = row.previous - 0.02
if compare < -1:
return -1
else:
return compare
elif row.signal == 1:
compare = row.previous + 0.01
if compare > 1:
return 1
else:
return compare
df['new_signal'] = df.apply(lambda row: get_signal_value(row), axis=1)