我想用sklearn进行多元线性回归。我有3个功能(年,月,日),我想预测kwh。 Data1
是一个DataFrame。这有什么问题?
dt = data1.drop(['kwh'], axis=1)
dt
year month day
date
2012-04-12 14:56:50 2012 4 12
2012-04-12 15:11:55 2012 4 12
2012-04-12 15:27:01 2012 4 12
2012-04-12 15:42:06 2012 4 12
2012-04-12 15:57:10 2012 4 12
2012-04-12 16:12:10 2012 4 12
2012-04-12 16:27:14 2012 4 12
2012-04-12 16:42:19 2012 4 12
2012-04-12 16:57:24 2012 4 12
2012-04-12 17:12:28 2012 4 12
2012-04-12 17:27:33 2012 4 12
2012-04-12 17:42:37 2012 4 12
2012-04-12 17:57:41 2012 4 12
2012-04-12 18:12:44 2012 4 12
2012-04-12 18:27:46 4 12
2012-04-12 18:42:51 2012 4 12
2012-04-12 18:57:54 2012 4 12
2012-04-12 19:12:58 2012 4 12
2012-04-12 19:28:01 2012 4 12
2012-04-12 19:43:04 2012 4 12
2012-04-12 19:58:07 2012 4 12
2012-04-12 20:13:10 2012 4 12
2012-04-12 20:28:15 2012 4 12
2012-04-12 20:43:15 2012 4 12
2012-04-12 20:58:18 2012 4 12
2012-04-12 21:13:20 2012 4 12
2012-04-12 21:28:22 2012 4 12
2012-04-12 21:43:24 2012 4 12
2012-04-12 21:58:27 2012 4 12
2012-04-12 22:13:29 2012 4 12
2012-04-12 22:28:34 2012 4 12
2012-04-12 22:43:38 2012 4 12
2012-04-12 22:58:43 2012 4 12
2012-04-12 23:13:43 2012 4 12
2012-04-12 23:28:46 2012 4 12
2012-04-12 23:43:55 2012 4 12
2012-04-12 23:59:00 2012 4 12
2012-04-13 00:14:02 2012 4 13
2012-04-13 00:29:05 2012 4 13
2012-04-13 00:44:09 2012 4 13
2012-04-13 00:59:09 2012 4 13
2012-04-13 01:14:10 2012 4 13
2012-04-13 01:29:11 2012 4 13
2012-04-13 01:44:16 2012 4 13
2012-04-13 01:59:22 2012 4 13
2012-04-13 02:14:21 2012 4 13
2012-04-13 02:29:24 2012 4 13
2012-04-13 02:44:24 2012 4 13
2012-04-13 02:59:25 2012 4 13
2012-04-13 03:14:30 2012 4 13
2012-04-13 03:29:31 2012 4 13
2012-04-13 03:44:31 2012 4 13
2012-04-13 03:59:42 2012 4 13
2012-04-13 04:14:43 2012 4 13
2012-04-13 04:29:43 2012 4 13
2012-04-13 04:44:46 2012 4 13
2012-04-13 04:59:47 2012 4 13
2012-04-13 05:14:48 2012 4 13
2012-04-13 05:29:49 2012 4 13
2012-04-13 05:44:50 2012 4 13
... ... ...
65701 rows × 3 columns
x_train, x_test, y_train, y_test = train_test_split(dt, data1['kwh'], test_size=0.4)
clf = LinearRegression()
clf.fit(x_train, y_train)
plt.scatter(x_test, y_test)
plt.plot(x_test, clf.predict(x_test), color='blue',
linewidth=3)
plt.show()
这是错误:
ValueError Traceback (most recent call last)
<ipython-input-97-a4b702fcee3d> in <module>()
----> 1 plt.scatter(x_test, y_test)
2 plt.plot(x_test, clf.predict(x_test), color='blue',
3 linewidth=3)
4 plt.show()
/usr/lib/pymodules/python2.7/matplotlib/pyplot.pyc in scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, hold, **kwargs)
3085 ret = ax.scatter(x, y, s=s, c=c, marker=marker, cmap=cmap, norm=norm,
3086 vmin=vmin, vmax=vmax, alpha=alpha,
-> 3087 linewidths=linewidths, verts=verts, **kwargs)
3088 draw_if_interactive()
3089 finally:
/usr/lib/pymodules/python2.7/matplotlib/axes.pyc in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, **kwargs)
6254 y = np.ma.ravel(y)
6255 if x.size != y.size:
-> 6256 raise ValueError("x and y must be the same size")
6257
6258 s = np.ma.ravel(s) # This doesn't have to match x, y in size.
ValueError: x and y must be the same size