我试图用Numpy的weight least squares (WLS)函数复制Statsmodels的ordinary least squares (OLS)函数的功能(即Numpy将OLS称为“最小二乘”)。
换句话说,我想在Numpy中计算WLS。我使用this Stackoverflow post作为参考,但显着不同的R²值从Statsmodel移动到Numpy。
采用以下复制此代码的示例代码:
import numpy as np
import statsmodels.formula.api as smf
import pandas as pd
# Test Data
patsy_equation = "y ~ C(x) - 1" # Use minus one to get ride of hidden intercept of "+ 1"
weight = np.array([0.37, 0.37, 0.53, 0.754])
y = np.array([0.23, 0.55, 0.66, 0.88])
x = np.array([3, 3, 3, 3])
d = {"x": x.tolist(), "y": y.tolist()}
data_df = pd.DataFrame(data=d)
# Weighted Least Squares from Statsmodel API
statsmodel_model = smf.wls(formula=patsy_equation, weights=weight, data=data_df)
statsmodel_r2 = statsmodel_model.fit().rsquared
# Weighted Least Squares from Numpy API
Aw = x.reshape((-1, 1)) * np.sqrt(weight[:, np.newaxis]) # Multiply two column vectors
Bw = y * np.sqrt(weight)
numpy_model, numpy_resid = np.linalg.lstsq(Aw, Bw, rcond=None)[:2]
numpy_r2 = 1 - numpy_resid / (Bw.size * Bw.var())
print("Statsmodels R²: " + str(statsmodel_r2))
print("Numpy R²: " + str(numpy_r2[0]))
运行此类代码后,我得到以下结果:
Statsmodels R²: 2.220446049250313e-16
Numpy R²: 0.475486515775414
显然这里出了点问题!谁能在这里指出我的缺点?我不想理解这个公式吗?