解析ValueError:找到具有0个样本的数组(shape =(0,1)),而MinMaxScaler至少需要1个

时间:2019-04-14 17:31:57

标签: python tensorflow scikit-learn

使用以下代码。首先从最高价和最低价计算中间价格。

high_prices = df.loc[:,'High'].as_matrix()
low_prices = df.loc[:,'Low'].as_matrix()
mid_prices = (high_prices+low_prices)/2.0

现在您可以拆分训练数据和测试数据。训练数据将是时间序列的前11,000个数据点,其余的将是测试数据。

train_data = mid_prices[:11000]
test_data = mid_prices[11000:]

现在您nescaler = MinMaxScaler()定义一个定标器以标准化数据。 MinMaxScalar将所有数据缩放到0和1的范围内。您还可以将训练和测试数据的形状调整为[data_size, num_features]。 缩放时,将数据缩放到0到1之间请记住!您可以针对训练数据标准化测试和训练数据。因为您不应访问测试数据。

scaler = MinMaxScaler()
train_data = train_data.reshape(-1,1)
test_data = test_data.reshape(-1,1)

使用训练数据和平滑数据训练洁牙机。

smoothing_window_size = 2500
for di in range(0,10000,smoothing_window_size):
    scaler.fit(train_data[di:di+smoothing_window_size,:])
    train_data[di:di+smoothing_window_size,:] = 
scaler.transform(train_data[di:di+smoothing_window_size,:])

您将剩余数据的最后一位标准化。

scaler.fit(train_data[di+smoothing_window_size:,:])
train_data[di+smoothing_window_size:,:] = 
scaler.transform(train_data[di+smoothing_window_size:,:])

将数据重塑为[data_size]的形状。重塑训练和测试数据

train_data = train_data.reshape(-1)

标准化测试数据。

test_data = scaler.transform(test_data).reshape(-1)

现在执行指数移动平均平滑。因此数据将比原始衣衫agged的数据具有更平滑的曲线

EMA = 0.0
gamma = 0.1
for ti in range(11000):
  EMA = gamma*train_data[ti] + (1-gamma)*EMA
  train_data[ti] = EMA

用于可视化和测试。

all_mid_data = np.concatenate([train_data,test_data],axis=0)

换句话说,您说在t + 1的预测是您在t到t−N的窗口内观察到的所有股价的平均值。

window_size = 100
N = train_data.size
std_avg_predictions = []
std_avg_x = []
mse_errors = []

for pred_idx in range(window_size,N):

    if pred_idx >= N:
        date = dt.datetime.strptime(k, '%Y-%m-%d').date() + dt.timedelta(days=1)
    else:
        date = df.loc[pred_idx,'Date']

    std_avg_predictions.append(np.mean(train_data[pred_idx-window_size:pred_idx]))
    mse_errors.append((std_avg_predictions[-1]-train_data[pred_idx])**2)
    std_avg_x.append(date)

print('MSE error for standard averaging: %.5f'%(0.5*np.mean(mse_errors)))


plt.figure(figsize = (18,9))
plt.plot(range(df.shape[0]),all_mid_data,color='b',label='True')
plt.plot(range(window_size,N),std_avg_predictions,color='orange',label='Prediction')
plt.xticks(range(0,df.shape[0],50),df['Date'].loc[::50],rotation=45)
plt.xlabel('Date')
plt.ylabel('Mid Price')
plt.legend(fontsize=18)
plt.show()

有一些警告

hp@hp-desktop:~$ python3 predict_stock.py 
Loaded data from the Kaggle repository
predict_stock.py:76: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  high_prices = df.loc[:,'High'].as_matrix()
predict_stock.py:77: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  low_prices = df.loc[:,'Low'].as_matrix()
MSE error for standard averaging: 0.00418

但是它很好用,除了日期格式不可更改 在使用alphavantage的情况下,这是输出

predict_stock.py:76: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  high_prices = df.loc[:,'High'].as_matrix()
predict_stock.py:77: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  low_prices = df.loc[:,'Low'].as_matrix()
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py:595: DataConversionWarning: Data with input dtype object was converted to float64 by MinMaxScaler.
  warnings.warn(msg, DataConversionWarning)
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py:595: DataConversionWarning: Data with input dtype object was converted to float64 by MinMaxScaler.
  warnings.warn(msg, DataConversionWarning)
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py:595: DataConversionWarning: Data with input dtype object was converted to float64 by MinMaxScaler.
  warnings.warn(msg, DataConversionWarning)
Traceback (most recent call last):
  File "predict_stock.py", line 98, in <module>
    scaler.fit(train_data[di:di+smoothing_window_size,:])
  File "/usr/local/lib/python3.6/dist-packages/sklearn/preprocessing/data.py", line 334, in fit
    return self.partial_fit(X, y)
  File "/usr/local/lib/python3.6/dist-packages/sklearn/preprocessing/data.py", line 362, in partial_fit
    force_all_finite="allow-nan")
  File "/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py", line 582, in check_array
    context))
ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required by MinMaxScaler.

已正确收集了cvs文件的第一幅图并存储了实际数据,没有创建第二幅图带有预测和mse。

[AAL图]图像存在一些问题: enter image description here

在上图中,您可以看到预测(橙色线)在数据结束之前停止了。因此,它似乎是一个移动平均线。以及为什么所有输入都像源文件一样在日期结束之前停止(请参见下文)

3408,2019-04-12,34.5,35.235,34.69,34.99
3407,2019-04-11,33.91,34.97,34.81,33.99
3406,2019-04-10,33.09,34.13,34.02,33.76
3405,2019-04-09,32.6,33.52,33.31,33.37
3404,2019-04-08,33.44,33.9529,33.88,33.64
3403,2019-04-05,33.88,34.4,34.06,33.97
3402,2019-04-04,33.35,34.12,33.93,33.96

0 个答案:

没有答案