随机森林回归:如何改进实施

时间:2016-07-23 07:46:37

标签: python machine-learning scikit-learn regression random-forest

我有一个库存数据集,我正在尝试预测"最后"价格提前5天。我使用RandomForestRegressor来预测" last"价钱。 这是" last"的时间序列。价钱。 enter image description here

以下是预测价格:

[ 56.45819  56.56159  57.05353  57.64981  57.85363]

和R平方值得分:

0.999883402515 

观察时间序列,最后的价格随着时间的推移而增加,但是,我的预测结果并不意味着这个事实。预测范围为50-60。

以下是代码:

class StockPricePrediction(object):

    trainData = pd.DataFrame()
    testData = pd.DataFrame()
    fullData = pd.DataFrame()
    features = []
    target_col = ''

    def __init__(self, priceTrainData, priceTestData, priceFullData, featuresOfInterest, target_column):

         #read csv data set
        self.trainData = pd.read_csv(priceTrainData)
        self.testData =  pd.read_csv(priceTestData)
        self.fullData = pd.read_csv(priceFullData)
        self.features = ['low', 'high', 'open', 'annualized_volatility', 'weekly_return','daily_average_volume_10','daily_average_volume_30','market_cap', 'monthly_return']

        self.target_col = 'last'

        self.trainData['Type'] = 'Train'
        self.testData['Type'] = 'Test'

def predict(self, numberOfPredictions):

            print("features")

            print(self.features)
            x_train = self.trainData[list(self.features)].values
            y_train = self.trainData[self.target_col].values
            x_test = self.testData[list(self.features)].values

            print("Train data:")
            rf = RandomForestRegressor(n_estimators = 1000)
            rf.fit(x_train, y_train)
            status = rf.predict(x_test[0:numberOfPredictions])
            print("Train Data Predictions:")
            print(status)

            print("Train Data R square value score:")
            print(rf.score(x_train, y_train))

我想知道我哪里出错了?关于实现新算法的任何建议或如何改进此算法。

0 个答案:

没有答案