使用以下代码。首先从最高价和最低价计算中间价格。
high_prices = df.loc[:,'High'].as_matrix()
low_prices = df.loc[:,'Low'].as_matrix()
mid_prices = (high_prices+low_prices)/2.0
现在您可以拆分训练数据和测试数据。训练数据将是时间序列的前11,000个数据点,其余的将是测试数据。
train_data = mid_prices[:11000]
test_data = mid_prices[11000:]
现在您nescaler = MinMaxScaler()
定义一个定标器以标准化数据。 MinMaxScalar将所有数据缩放到0和1的范围内。您还可以将训练和测试数据的形状调整为[data_size, num_features]
。
缩放时,将数据缩放到0到1之间请记住!您可以针对训练数据标准化测试和训练数据。因为您不应访问测试数据。
scaler = MinMaxScaler()
train_data = train_data.reshape(-1,1)
test_data = test_data.reshape(-1,1)
使用训练数据和平滑数据训练洁牙机。
smoothing_window_size = 2500
for di in range(0,10000,smoothing_window_size):
scaler.fit(train_data[di:di+smoothing_window_size,:])
train_data[di:di+smoothing_window_size,:] =
scaler.transform(train_data[di:di+smoothing_window_size,:])
您将剩余数据的最后一位标准化。
scaler.fit(train_data[di+smoothing_window_size:,:])
train_data[di+smoothing_window_size:,:] =
scaler.transform(train_data[di+smoothing_window_size:,:])
将数据重塑为[data_size]
的形状。重塑训练和测试数据
train_data = train_data.reshape(-1)
标准化测试数据。
test_data = scaler.transform(test_data).reshape(-1)
现在执行指数移动平均平滑。因此数据将比原始衣衫agged的数据具有更平滑的曲线
EMA = 0.0
gamma = 0.1
for ti in range(11000):
EMA = gamma*train_data[ti] + (1-gamma)*EMA
train_data[ti] = EMA
用于可视化和测试。
all_mid_data = np.concatenate([train_data,test_data],axis=0)
换句话说,您说在t + 1的预测是您在t到t−N的窗口内观察到的所有股价的平均值。
window_size = 100
N = train_data.size
std_avg_predictions = []
std_avg_x = []
mse_errors = []
for pred_idx in range(window_size,N):
if pred_idx >= N:
date = dt.datetime.strptime(k, '%Y-%m-%d').date() + dt.timedelta(days=1)
else:
date = df.loc[pred_idx,'Date']
std_avg_predictions.append(np.mean(train_data[pred_idx-window_size:pred_idx]))
mse_errors.append((std_avg_predictions[-1]-train_data[pred_idx])**2)
std_avg_x.append(date)
print('MSE error for standard averaging: %.5f'%(0.5*np.mean(mse_errors)))
plt.figure(figsize = (18,9))
plt.plot(range(df.shape[0]),all_mid_data,color='b',label='True')
plt.plot(range(window_size,N),std_avg_predictions,color='orange',label='Prediction')
plt.xticks(range(0,df.shape[0],50),df['Date'].loc[::50],rotation=45)
plt.xlabel('Date')
plt.ylabel('Mid Price')
plt.legend(fontsize=18)
plt.show()
有一些警告
hp@hp-desktop:~$ python3 predict_stock.py
Loaded data from the Kaggle repository
predict_stock.py:76: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
high_prices = df.loc[:,'High'].as_matrix()
predict_stock.py:77: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
low_prices = df.loc[:,'Low'].as_matrix()
MSE error for standard averaging: 0.00418
但是它很好用,除了日期格式不可更改 在使用alphavantage的情况下,这是输出
predict_stock.py:76: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
high_prices = df.loc[:,'High'].as_matrix()
predict_stock.py:77: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
low_prices = df.loc[:,'Low'].as_matrix()
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py:595: DataConversionWarning: Data with input dtype object was converted to float64 by MinMaxScaler.
warnings.warn(msg, DataConversionWarning)
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py:595: DataConversionWarning: Data with input dtype object was converted to float64 by MinMaxScaler.
warnings.warn(msg, DataConversionWarning)
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py:595: DataConversionWarning: Data with input dtype object was converted to float64 by MinMaxScaler.
warnings.warn(msg, DataConversionWarning)
Traceback (most recent call last):
File "predict_stock.py", line 98, in <module>
scaler.fit(train_data[di:di+smoothing_window_size,:])
File "/usr/local/lib/python3.6/dist-packages/sklearn/preprocessing/data.py", line 334, in fit
return self.partial_fit(X, y)
File "/usr/local/lib/python3.6/dist-packages/sklearn/preprocessing/data.py", line 362, in partial_fit
force_all_finite="allow-nan")
File "/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py", line 582, in check_array
context))
ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required by MinMaxScaler.
已正确收集了cvs文件的第一幅图并存储了实际数据,没有创建第二幅图带有预测和mse。
在上图中,您可以看到预测(橙色线)在数据结束之前停止了。因此,它似乎是一个移动平均线。以及为什么所有输入都像源文件一样在日期结束之前停止(请参见下文)
3408,2019-04-12,34.5,35.235,34.69,34.99
3407,2019-04-11,33.91,34.97,34.81,33.99
3406,2019-04-10,33.09,34.13,34.02,33.76
3405,2019-04-09,32.6,33.52,33.31,33.37
3404,2019-04-08,33.44,33.9529,33.88,33.64
3403,2019-04-05,33.88,34.4,34.06,33.97
3402,2019-04-04,33.35,34.12,33.93,33.96