我正在尝试在Pandas中建立一个交易回溯测试,并且在使用np.where()的'if'语句中有一些问题有条件地更新其他列。
我的初始df,其中信号表示是买入还是卖出(2 / -1 / 0),根据这些信号,我想更新现金,保留,价值和总列数。
open high low close change signal Cash Hold Value Total
time
2017-09-09 03:01:00 4255.000000 4256.799805 4233.600098 4252.799805 -0.000065 0 10000.0 0.0 0.0 10000.0
2017-09-09 03:02:00 4251.399902 4258.500000 4247.500000 4258.399902 0.002046 1 10000.0 0.0 0.0 10000.0
2017-09-09 03:03:00 4256.500000 4289.299805 4256.500000 4273.700195 0.001262 1 10000.0 0.0 0.0 10000.0
2017-09-09 03:04:00 4273.100098 4299.899902 4262.580566 4284.100098 0.001905 1 10000.0 0.0 0.0 10000.0
2017-09-09 03:05:00 4291.200195 4299.799805 4284.200195 4289.899902 -0.000854 -1 10000.0 0.0 0.0 10000.0
2017-09-09 03:06:00 4295.000000 4298.799805 4279.500000 4279.500000 -0.000047 0 10000.0 0.0 0.0 10000.0
2017-09-09 03:07:00 4278.000000 4278.299805 4277.000000 4277.799805 -0.000244 0 10000.0 0.0 0.0 10000.0
我可以通过根据信号手动调用以下每个函数来完成此操作:
def buy_update(i=i):
pf['Cash'].iloc[i] = pf['Cash'].iloc[i-1] - trade_size
pf['Holdings'].iloc[i] = pf['Holdings'].iloc[i-1] + (trade_size / pf['close'].iloc[i])
pf['Holdings Value'].iloc[i] = pf['close'].iloc[i] * pf['Holdings'].iloc[i] # Update Values
pf['Total Holding'].iloc[i] = pf['Cash'].iloc[i] + pf['Holdings Value'].iloc[i] # Update Values
def sell_update(i=i):
pf['Cash'].iloc[i] = (pf['Cash'].iloc[i-1] + (pf['Holdings'].iloc[i-1] * pf['close'].iloc[i])) # get cash for sale
pf['Holdings'].iloc[i] = 0 # Sell down all assets
pf['Holdings Value'].iloc[i] = pf['close'].iloc[i] * pf['Holdings'].iloc[i] # Update Values
pf['Total Holding'].iloc[i] = pf['Cash'].iloc[i] + pf['Holdings Value'].iloc[i] # Update Value
def no_action(i=i):
pf['Cash'].iloc[i] = pf['Cash'].iloc[i-1]
pf['Holdings'].iloc[i] = pf['Holdings'].iloc[i-1]
pf['Holdings Value'].iloc[i] = pf['close'].iloc[i] * pf['Holdings'].iloc[i] # Update Values
pf['Total Holding'].iloc[i] = pf['Cash'].iloc[i] + pf['Holdings Value'].iloc[i] # Update Values
然后产生这个:
open high low close change signal Cash Hold Value Total
time
2017-09-09 03:01:00 4255.000000 4256.799805 4233.600098 4252.799805 -0.000065 0 10000.00000 0.000000 0.000000 10000.000000
2017-09-09 03:02:00 4251.399902 4258.500000 4247.500000 4258.399902 0.002046 1 9900.00000 0.023483 100.000000 10000.000000
2017-09-09 03:03:00 4256.500000 4289.299805 4256.500000 4273.700195 0.001262 1 9800.00000 0.046882 200.359297 10000.359297
2017-09-09 03:04:00 4273.100098 4299.899902 4262.580566 4284.100098 0.001905 1 9700.00000 0.070224 300.846864 10000.846864
2017-09-09 03:05:00 4291.200195 4299.799805 4284.200195 4289.899902 -0.000854 -1 10001.25415 0.000000 0.000000 10001.254150
2017-09-09 03:06:00 4295.000000 4298.799805 4279.500000 4279.500000 -0.000047 0 10001.25415 0.000000 0.000000 10001.254150
2017-09-09 03:07:00 4278.000000 4278.299805 4277.000000 4277.799805 -0.000244 0 10001.25415 0.000000 0.000000 10001.254150
我认为嵌套的np.where()可以根据信号列调用正确的函数,但我没有运气。以下循环遍历每一行。
for i in range(len(pf)):
np.where(pf['signal'].iloc[i] == -1, sell_update(i), np.where(pf['signal'].iloc[i] == 1, buy_update(i), no_action(i)))
print(i)
我认为它目前称为每个功能 - 卖出,然后买入,然后没有(每次都覆盖最后一个)以及产生SettingWithCopyWarning
警告。
此外,每行的for循环显然非常慢,有没有办法对此进行矢量化?
答案 0 :(得分:1)
当计算代码变得复杂时,很难对其进行矢量化。由于pandas中逐个元素的进程很慢,您可以将数据帧转换为dict列表,并进行计算,这是一个使用cytoolz
的示例:
import io
import pandas as pd
text="""time open high low close change signal Cash Hold Value Total
2017-09-09 03:01:00 4255.000000 4256.799805 4233.600098 4252.799805 -0.000065 0 10000.0 0.0 0.0 10000.0
2017-09-09 03:02:00 4251.399902 4258.500000 4247.500000 4258.399902 0.002046 1 10000.0 0.0 0.0 10000.0
2017-09-09 03:03:00 4256.500000 4289.299805 4256.500000 4273.700195 0.001262 1 10000.0 0.0 0.0 10000.0
2017-09-09 03:04:00 4273.100098 4299.899902 4262.580566 4284.100098 0.001905 1 10000.0 0.0 0.0 10000.0
2017-09-09 03:05:00 4291.200195 4299.799805 4284.200195 4289.899902 -0.000854 -1 10000.0 0.0 0.0 10000.0
2017-09-09 03:06:00 4295.000000 4298.799805 4279.500000 4279.500000 -0.000047 0 10000.0 0.0 0.0 10000.0
2017-09-09 03:07:00 4278.000000 4278.299805 4277.000000 4277.799805 -0.000244 0 10000.0 0.0 0.0 10000.0"""
df = pd.read_csv(io.StringIO(text), delim_whitespace=True)
trade_size = 100
import cytoolz
def f(p, c):
change = c["signal"]
if change == 0:
cash = c["Cash"]
hold = c["Hold"]
elif change == 1:
cash = p["Cash"] - trade_size
hold = p["Hold"] + trade_size / c["close"]
elif change == -1:
cash = p["Cash"] + p["Hold"] * c["close"]
hold = 0
return cytoolz.merge(c, {"Cash":cash, "Hold":hold})
pd.DataFrame(list(cytoolz.accumulate(f, df.to_dict("records"))))