Question

我正在尝试从scikit-learn中实现SDGregressor以获得简单的线性回归问题，但是我的代码每次都给出了不同的RMSLE值？

我想知道为什么会这样？另外，我想知道如何获得RMSLE的最小值？

from sklearn import linear_model
from sklearn.metrics import mean_squared_error
from math import sqrt
import math
import matplotlib.pyplot as plt

#load data
train = pd.read_csv('Train.csv')
test = pd.read_csv('Test.csv')

#split data
x_train = train.GrLivArea[:1000].values.reshape(-1,1)
y_train = train.SalePrice[:1000].values.reshape(-1,1)

x_train_normal = np.log(x_train)
y_train_normal = np.log(y_train) #Normalization

x_test = train.GrLivArea[1000:].values.reshape(-1,1)
y_test = train.SalePrice[1000:].values.reshape(-1,1)

x_test_normal = np.log(x_test)
y_test_normal = np.log(y_test) # Normalization

y_test_transform = np.exp(y_test_normal)

Model = linear_model.SGDRegressor()
Model.n_iter = np.ceil(10**7 / len(y_train_normal))
Model.fit(x_train_normal,y_train_normal)

Sale_Prices_Predicted = Model.predict(x_test_normal)
Sale_Prices_Prediceted_Transform = np.exp(Sale_Prices_Predicted)

rmslee = rmsle(y_test_transform, Sale_Prices_Prediceted_Transform)
print("RMSLE: ", rmslee)

例如：

0.28153047299638045
0.28190513681658363
0.28207666380009233
0.28126007334118047

Answer 1

很简单，SGDRegessor每次都没有以相同的方式初始化。如果您希望获得可重现的结果，则需要修复种子。

不同的随机初始化会导致略有不同的结果。机器学习中很常见的情况。

对于随机初始化的任何类型的模型，都会出现此行为：

神经网络
随机森林
SVM
等

文档SGDRegressor：http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html

random_state：int，RandomState实例或None，可选   （缺省值=无）

混洗时使用的伪随机数生成器的种子   数据。如果是int，则random_state是随机数使用的种子   发电机;如果是RandomState实例，则random_state是随机数   发电机;如果为None，则随机数生成器是RandomState   np.random使用的实例。

Python SGDregressor，RMSLE

1 个答案: