Question

我想对S＆amp; P 500指数实施K近邻算法来预测未来价格，并通过scikit-learn库开发python定量算法交易模型。虽然我对kNN算法有基本的了解，但我是python机器学习编码的新手，所以如果有人能帮助我，我会很高兴。

这是我的模拟逻辑

资产：S＆amp; P 500指数月度价格（可与ETF一起投资）
逻辑
- 根据每个月末的kNN算法预测下个月（上涨或下跌）的价格方向----＆gt;预测：买入S＆amp; P 500指数，下跌：持有现金（假设指数年回报率为3％）
- 训练数据集：最近滚动12个月度数据（训练数据集随着时间的推移不断变化，如移动平均值）
- 自变量：最近3,6,9,12蛾的回报，最近12个月的月回报滚动标准差
- 附属变量：下个月的回报表示为正面或负面

这是我的代码。我可以编写基本数据集，但不知道编写主算法和模拟逻辑。任何人都可以完成这段代码吗？

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import pandas_datareader.data as web

def price(stock, start):
    price = web.DataReader(name=stock, data_source='yahoo', start=start)['Adj Close']
    return price.div(price.iat[0]).resample('M').last().to_frame('price')

a = price('SPY','2000-01-01')
a['cash'] = [(1.03**(1/12))**x for x in range(len(a.index))]
a['R3'] = a.price/a.price.shift(3)
a['R6'] = a.price/a.price.shift(6)
a['R9'] = a.price/a.price.shift(9)    
a['R12'] = a.price/a.price.shift(12)    
a['rollingstd'] = a.price.pct_change().rolling(12).std()

Answer 1

我做到了。虽然这是使用分形动量得分的另一个策略版本，但它可能会有所帮助

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import pandas_datareader.data as web
from sklearn import neighbors, svm
from sklearn.ensemble import RandomForestClassifier

def price(stock, start):
    price = web.DataReader(name=stock, data_source='yahoo', start=start)['Adj Close']
    return price.div(price.iat[0]).resample('M').last().to_frame('price')

def fractal(a, p):
    df = pd.DataFrame()
    for count in range(1,p+1):
        a['direction'] = np.where(a['price'].diff(count)>0,1,0)
        a['abs'] = a['price'].diff(count).abs()
        a['volatility'] = a.price.diff().abs().rolling(count).sum()
        a['fractal'] = a['abs']/a['volatility']*a['direction']
        df = pd.concat([df, a['fractal']], axis=1)
    return df

def meanfractal(a, l=12):
    a['meanfractal']= pd.DataFrame(fractal(a, l)).sum(1,skipna=False)/l

a = price('^KS11','2000-01-01')
a['cash'] = [(1.03**(1/12))**x for x in range(len(a.index))]
a['meanfractal']= pd.DataFrame(fractal(a, 12)).sum(1,skipna=False)/12   
a['rollingstd'] = a.price.pct_change().shift(1).rolling(12).std()
a['result'] = np.where(a.price > a.price.shift(1), 1,0)     
a = a.dropna()

print(a)

clf = neighbors.KNeighborsClassifier(n_neighbors=3)
clf1 = svm.SVC()
clf3 = RandomForestClassifier(n_estimators=5)

a['predicted']= pd.Series()
for i in range(12,len(a.index)):
    x  =  a.iloc[i-12:i,6:8]    
    y  =  a['result'][i-12:i] 
    clf3.fit(x, y)
    a['predicted'][i]= clf3.predict(x)[-1] 

a = a.dropna()
a.price = a.price.div(a.price.ix[0])
print(a)
accuracy=clf3.score(a.iloc[:,6:8],a['result'])

a['결과'] = np.where(a.predicted.shift(1)==1,a.price/a.price.shift(1),1).cumprod()
a['result'] = np.where(a.predicted.shift(1)==1,(a.price/a.price.shift(1)+1.0026)/2,1.0026).cumprod()
a['동일비중'] = ((a.price/a.price.shift(1)+1.0026)/2).cumprod()
a[['result','price','결과']].plot()
plt.show()
print ("Predicted model accuracy: "+ str(accuracy))

用python在S＆amp; P 500索引中实现的k-最近邻（KNN）算法

1 个答案: