随机森林,时间预测

时间:2020-03-03 14:19:05

标签: python datetime random-forest

我想通过时间戳和其他字符串的数据来预测'diffsecond'(最后一列)。我不知道为什么我的时间戳不能成为训练数据。我是一名英语学习者,对此感到抱歉语法错误。谢谢!

这是数据(安全链接) (图片预览)----- {https://tlgur.com/d/gvqzMLQG (.csv)----- https://tlgur.com/d/GEzpYENg

这是错误(图片):--- https://tlgur.com/d/Gozay208

这是我的代码:

import pandas as pd
import matplotlib.pyplot as plt
import datetime
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
df = pd.read_csv(r'C:\Users\12232\Desktop\bb.csv')
x = df[["TrainActualTime"]]
x["TrainActualTime"] = pd.to_datetime(x["TrainActualTime"],format='%d/%m/%Y %H:%M:%S')
x['loc_stanox'] = df['loc_stanox'].apply(str)
x['Event_type']=df['Event_type']
x=pd.get_dummies(x)
dt=pd.read_csv(r'C:\Users\12232\Desktop\bb.csv')
dt = dt[["diffseconds"]]
print(x)
#x.to_csv(r'C:\Users\12232\Desktop\x.csv')

X_train, X_test, y_train, y_test = train_test_split(x, dt, train_size=0.8,random_state=1)
rf = RandomForestRegressor(n_estimators=1000)
rf.fit(X_train, y_train)

print(rf.predict(X_train))
print("Traing Score:%f" %rf.score(X_test, y_test))

0 个答案:

没有答案