我正在尝试创建一个简单的回归模型来预测时间序列数据集的未来值。 (准确度/误差并不重要)。目前,我收到错误:
TypeError: Cannot cast array data from dtype('<M8[ns]') to dtype('float64') according to the rule 'safe'
我目前的代码是这样的:
def run_linear_model(data_set):
features = np.array(data_set.index)
labels = np.array(data_set['Price'])
training_features, testing_features, training_labels, testing_labels = train_test_split(features, labels, train_size=0.8, test_size=0.2, shuffle=False)
clf = LinearRegression()
clf.fit(training_features.reshape(-1, 1), training_labels)
results = clf.predict(testing_features.reshape(-1, 1))
其中变量data_set是格式为的Dataframe:
Open High Low Close Price
datetime
2018-03-09 08:01:00 1701.00 1703.2 1697.00 1701.8 1700.7500
2018-03-09 08:13:00 1705.60 1706.0 1703.40 1703.4 1704.6000
2018-03-09 08:25:00 1708.40 1709.2 1706.80 1706.8 1707.8000
2018-03-09 08:37:00 1708.40 1708.6 1706.40 1706.4 1707.4500
2018-03-09 08:49:00 1710.00 1713.6 1709.88 1712.6 1711.5200
答案 0 :(得分:1)
看起来您的features
数组是datetime64数据类型,但线性回归库期望它是一个浮点数组。试试这个:
dates = np.array(data_set.index)
unix_epoch = np.datetime64(0, 's')
one_second = np.timedelta64(1, 's')
features = (dates - unix_epoch) / one_second #seconds since Jan 1 1970