Question

我已经实现了一种在Python中为OLS回归计算beta的方法。现在，我想使用R ^ 2对模型评分。对于我的任务，我不允许使用Python包这样做，因此必须从头开始实现一个方法。

#load the data
import numpy as np
import pandas as pd
from numpy.linalg import inv
from sklearn.datasets import load_boston
boston = load_boston()

# Set the X and y variables. 
X = boston.data
y = boston.target

#append ones to my X matrix. 
int = np.ones(shape=y.shape)[..., None]
X = np.concatenate((int, X), 1)

#compute betas. 
betas = inv(X.transpose().dot(X)).dot(X.transpose()).dot(y)

# extract the feature names of the boston data set and prepend the 
#intercept
names = np.insert(boston.feature_names, 0, 'INT')

# collect results into a DataFrame for pretty printing
results = pd.DataFrame({'coeffs':betas}, index=names)

#print the results
print(results)

            coeffs
INT      36.491103
CRIM     -0.107171
ZN        0.046395
INDUS     0.020860
CHAS      2.688561
NOX     -17.795759
RM        3.804752
AGE       0.000751
DIS      -1.475759
RAD       0.305655
TAX      -0.012329
PTRATIO  -0.953464
B         0.009393
LSTAT    -0.525467

现在，我想实现一个R ^ 2来对该数据（或任何其他数据）的模型进行评分。（看这里： https://en.wikipedia.org/wiki/Coefficient_of_determination）

我的问题是我不确定如何计算分子SSE。在代码中看起来像这样：

#numerator
sse = sum((Y - yhat ** 2)

其中Y是波士顿房屋价格，而yhat是这些房屋的预测价格。但是，如何计算yhat项呢？

Answer 1

yhat是您对给定观察值的估计。您可以通过X.dot(betas)使用dot产品同时获得所有估算值。

您的误差总和如下所示（请注意对您提供的版本的更正：您需要对差异进行平方，即对误差进行平方）：

y_hat = X.dot(betas)
errors = y - y_hat 
sse = (errors ** 2).sum()

您的总平方和：

tss = ((y - y.mean()) ** 2).sum()

以及所得的R平方（确定系数）：

r2 = 1 - sse / tss

此外，我不会使用int作为变量名来避免破坏内置的int函数（只需将其命名为ones或const）。 / p>

为OLS回归计算yhat

1 个答案: