Python-如何对线性回归中的SGD中获得的W和B截距进行交叉检查?

时间:2018-11-25 08:16:39

标签: python-3.x linear-regression stochastic

我已经使用偏导数为线性回归手动实现了SGD。我正在研究SKlearn的波士顿房价数据集。我的UDF的输入是一个(训练)数据帧,其中包含目标列以外的标准化数据。

X_train, X_test, y_train, y_test = train_test_split(df.loc[:, df.columns != 'target'], df.target, test_size=0.15, random_state=42)
dt_scaler = StandardScaler().fit(X_train)
scaler_data = dt_scaler.transform(X_train)
final_ds = pd.DataFrame(scaler_data, columns= boston.feature_names)
final_ds['target'] = y_train
final_ds.target = final_ds.target.fillna((df.target.mean()))

现在我的UDF是

def best_w_b(data):
w0 = np.random.normal(0, 1, (boston.data.shape[1],)).T
b0 = np.random.normal(0, 1)
r = 1
i = 1
while(1):
  k = data.sample(n=150 ,replace = True)
  for i in range(0,150,1):
    der_w= np.zeros(boston.data.shape[1]) ;der_b = 0
    der_w += np.dot(-2 * k.iloc[i][k.columns !='target'].T ,(k.iloc[i].target - np.dot(w0,k.iloc[i][k.columns !='target'].values) - b0))
    der_b += (-2 * (k.iloc[i].target - np.dot(w0,k.iloc[i][k.columns !='target'].values) - b0 ))
  w1 = np.subtract(w0,(r * der_w/150))
  b1 = b0 - (r * der_b/150)
  w_dist = np.linalg.norm(w0-w1)
  b_dist = np.linalg.norm(b0-b1)
  if (w0==w1).all():
    return w0,b0
  else:
    w0 = w1
  b0 = b1
  r = r/2
  i = i + 1

运行此方法后,我已经获得了W和B值,因此,如果我在火车数据上使用Summation 0到n-1(Y-Y_hat)^ 2 / / n,则不应获得大约0的值。

代码如下:

 error = 0
 for i in range(0,X_train.shape[0],1):
   error = error + (final_ds.iloc[i].target - (np.dot(optimal_w,final_ds.iloc[i][final_ds.columns !='target']) + optimal_b))**2
 print(error/X_train.shape[0])

我在582附近遇到错误。这是测试的正确方法吗?

PS:

最优W为: [[-0.22178286,-1.30943816,-0.61933446,1.07290039,-0.96299363,-0.59459475,-1.4094494,0.2022922,-1.45901487,0.00561458,-0.31858595,-0.71790656,-0.97285501 ] 最佳B是:1.179334612848228

0 个答案:

没有答案