具有线性回归误差的时间序列预测

时间:2018-03-10 15:14:47

标签: python-3.x numpy scikit-learn

我正在尝试创建一个简单的回归模型来预测时间序列数据集的未来值。 (准确度/误差并不重要)。目前,我收到错误:

TypeError: Cannot cast array data from dtype('<M8[ns]') to dtype('float64') according to the rule 'safe'

我目前的代码是这样的:

def run_linear_model(data_set):
    features = np.array(data_set.index)
    labels = np.array(data_set['Price'])
    training_features, testing_features, training_labels, testing_labels = train_test_split(features, labels, train_size=0.8, test_size=0.2, shuffle=False)
    clf = LinearRegression()
    clf.fit(training_features.reshape(-1, 1), training_labels)
    results = clf.predict(testing_features.reshape(-1, 1))

其中变量data_set是格式为的Dataframe:

                       Open    High      Low   Close      Price
datetime                                                        
2018-03-09 08:01:00  1701.00  1703.2  1697.00  1701.8  1700.7500
2018-03-09 08:13:00  1705.60  1706.0  1703.40  1703.4  1704.6000
2018-03-09 08:25:00  1708.40  1709.2  1706.80  1706.8  1707.8000
2018-03-09 08:37:00  1708.40  1708.6  1706.40  1706.4  1707.4500
2018-03-09 08:49:00  1710.00  1713.6  1709.88  1712.6  1711.5200

1 个答案:

答案 0 :(得分:1)

看起来您的features数组是datetime64数据类型,但线性回归库期望它是一个浮点数组。试试这个:

dates = np.array(data_set.index)
unix_epoch = np.datetime64(0, 's')
one_second = np.timedelta64(1, 's')
features = (dates - unix_epoch) / one_second #seconds since Jan 1 1970