我写了一个简单的脚本来生成和回归随机样本数据:
import matplotlib.pyplot as plt
import numpy as np
import random
import sklearn.datasets
import sklearn.linear_model as lm
##########################################
n = np.random.randint(1,10)
b = np.random.randint(50,200)
X1_, Y1_ = sklearn.datasets.make_regression(n_samples=100, n_features=1, noise=n, bias=b)
X1 = X1_.reshape(len(X1_), 1)
Y1 = Y1_.reshape(len(Y1_), 1)
##########################################
x = np.array(X1)
y = np.array(Y1)
##########################################
lr = lm.LinearRegression()
lr.fit(x, y)
td = np.arange(1, 101, 1).reshape(100, 1)
n_y = lr.predict(td)
##########################################
f, ax = plt.subplots(1, 2, sharey=True)
ax[0].scatter(x, y)
ax[0].set_xlim([-4, 4])
ax[0].set_title("x, y")
ax[1].plot(x, n_y, 'g')
ax[1].set_xlim([-4, 4])
ax[1].set_title("x_tr, y_lr")
f.suptitle("Regression")
plt.ylim(y.min()-1, y.max()+1)
##########################################
print ("Array: {}\nType: {}\nShape: {}\nLength: {}\nData: {}\n".format("X1", type(X1), str(np.shape(X1)), len(X1), str(X1)))
print ("Array: {}\nType: {}\nShape: {}\nLength: {}\nData: {}\n".format("Y1", type(Y1), str(np.shape(Y1)), len(Y1), str(Y1)))
print ("Array: {}\nType: {}\nShape: {}\nLength: {}\nData: {}\n".format("x", type(x), str(np.shape(x)), len(x), str(x)))
print ("Array: {}\nType: {}\nShape: {}\nLength: {}\nData: {}\n".format("y", type(y), str(np.shape(y)), len(y), str(y)))
print ("Array: {}\nType: {}\nShape: {}\nLength: {}\nData: {}\n".format("td", type(td), str(np.shape(td)), len(td), str(td)))
print ("Array: {}\nType: {}\nShape: {}\nLength: {}\nData: {}\n".format("n_y", type(n_y), str(np.shape(n_y)), len(n_y), str(n_y)))
##########################################
plt.show()
虽然看起来工作正常但没有错误,但我仍然关注准确性:回归线总是充满随机角度,形状奇特。我该怎么测试呢?我应该注意哪些错误报告功能?
答案 0 :(得分:0)
您观察到的是因为您的数据是随机的。回归本质上是恢复生成数据的分布,因此你试图恢复随机生成器的分布具有讽刺意味,它实际上试图隐藏它的分布。
如果要测试回归方法,则应使用互联网上提供的一些常用ML数据集。例如:UCI ML数据集集合(用于回归任务的过滤器):http://archive.ics.uci.edu/ml/datasets.html