我有一个库存数据集,我正在尝试预测"最后"价格提前5天。我使用RandomForestRegressor来预测" last"价钱。 这是" last"的时间序列。价钱。
以下是预测价格:
[ 56.45819 56.56159 57.05353 57.64981 57.85363]
和R平方值得分:
0.999883402515
观察时间序列,最后的价格随着时间的推移而增加,但是,我的预测结果并不意味着这个事实。预测范围为50-60。
以下是代码:
class StockPricePrediction(object):
trainData = pd.DataFrame()
testData = pd.DataFrame()
fullData = pd.DataFrame()
features = []
target_col = ''
def __init__(self, priceTrainData, priceTestData, priceFullData, featuresOfInterest, target_column):
#read csv data set
self.trainData = pd.read_csv(priceTrainData)
self.testData = pd.read_csv(priceTestData)
self.fullData = pd.read_csv(priceFullData)
self.features = ['low', 'high', 'open', 'annualized_volatility', 'weekly_return','daily_average_volume_10','daily_average_volume_30','market_cap', 'monthly_return']
self.target_col = 'last'
self.trainData['Type'] = 'Train'
self.testData['Type'] = 'Test'
def predict(self, numberOfPredictions):
print("features")
print(self.features)
x_train = self.trainData[list(self.features)].values
y_train = self.trainData[self.target_col].values
x_test = self.testData[list(self.features)].values
print("Train data:")
rf = RandomForestRegressor(n_estimators = 1000)
rf.fit(x_train, y_train)
status = rf.predict(x_test[0:numberOfPredictions])
print("Train Data Predictions:")
print(status)
print("Train Data R square value score:")
print(rf.score(x_train, y_train))
我想知道我哪里出错了?关于实现新算法的任何建议或如何改进此算法。