拥有4行特征的pandas数据框,我从" forecast_col"创建标签。并将它们移回过去以便稍后进行预测:
pandasdf['label'] = pandasdf[forecast_col].shift(-forecast_out)
取除'标签以外的所有行'柱:
X = np.array(pandasdf.drop(['label'], 1))
规范化数据:
X = preprocessing.scale(X)
将最后一行用于未来预测:
X_lately = X[-forecast_out:]
选择培训和交叉验证数据:
X = X[:-forecast_out]
y = np.array(pandasdf['label'])[:-forecast_out]
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.3)
训练分类器:
clf = LinearRegression(n_jobs=-1)
clf.fit(X_train, y_train)
检查准确性 - 它约为95%: accuracy = clf.score(X_test,y_test)
预测最后数据:
forecast_set = clf.predict(X_lately)
在这里,我应该得到" forecast_out"的未来价格清单。期间,但我预测了相同的最后数据(X_lately)价格
以下是这个例子: forecasting the past
我做错了什么?