我使用scipy.odr使用权重拟合数据,但我不知道如何获得拟合优度或R平方的度量。有没有人建议如何使用函数存储的输出获得此度量?
答案 0 :(得分:6)
Output
的res_var
属性是拟合的所谓降低卡方值,是适合度统计量的流行选择。但是,It is somewhat problematic用于非线性拟合。您可以直接查看残差(out.delta
残差为X
,out.eps
残差为Y
。如链接文件中所建议的那样,实施交叉验证或引导方法来确定拟合度,留给读者练习。
答案 1 :(得分:2)
这应该对您有用:
import numpy as np
from scipy import stats, odr
def linear_func(B, x):
"""
From https://docs.scipy.org/doc/scipy/reference/odr.html
Linear function y = m*x + b
"""
# B is a vector of the parameters.
# x is an array of the current x values.
# x is in the same format as the x passed to Data or RealData.
#
# Return an array in the same format as y passed to Data or RealData.
return B[0] * x + B[1]
np.random.seed(0)
sigma_x = .1
sigma_y = .15
N = 100
x_star = np.linspace(0, 10, N)
x = np.random.normal(x_star, sigma_x, N)
# the true underlying function is y = 2*x_star + 1
y = np.random.normal(2*x_star + 1, sigma_y, N)
linear = odr.Model(linear_func)
dat = odr.Data(x, y, wd=1./sigma_x**2, we=1./sigma_y**2)
this_odr = odr.ODR(dat, linear, beta0=[1., 0.])
odr_out = this_odr.run()
# degrees of freedom are n_samples - n_parameters
df = N - 2 # equivalently, df = odr_out.iwork[10]
t_stat = odr_out.beta[0] / odr_out.sd_beta[0] # t statistic for the slope parameter
p_val = stats.t.sf(np.abs(t_stat), df) * 2
print('Recovered equation: y={:3.2f}x + {:3.2f}, t={:3.2f}, p={:.2e}'.format(odr_out.beta[0], odr_out.beta[1], t_stat, p_val))
Recovered equation: y=2.00x + 1.01, t=239.63, p=1.76e-137
答案 2 :(得分:0)
正如 R. Ken 所提到的,残差的卡方或方差是其中之一
常用的拟合优度检验。 ODR 存储平方和
out.sum_square
中的残差,您可以自己验证
out.res_var = out.sum_square/degrees_freedom
对应于通常所说的缩减卡方:即卡方测试结果除以其期望值。
至于线性回归中另一个非常流行的拟合优度估计器 R 平方及其调整版本,我们可以定义函数
import numpy as np
def R_squared(observed, predicted, uncertainty=1):
""" Returns R square measure of goodness of fit for predicted model. """
weight = 1./uncertainty
return 1. - (np.var((observed - predicted)*weight) / np.var(observed*weight))
def adjusted_R(x, y, model, popt, unc=1):
"""
Returns adjusted R squared test for optimal parameters popt calculated
according to W-MN formula, other forms have different coefficients:
Wherry/McNemar : (n - 1)/(n - p - 1)
Wherry : (n - 1)/(n - p)
Lord : (n + p - 1)/(n - p - 1)
Stein : (n - 1)/(n - p - 1) * (n - 2)/(n - p - 2) * (n + 1)/n
"""
# Assuming you have a model with ODR argument order f(beta, x)
# otherwise if model is of the form f(x, a, b, c..) you could use
# R = R_squared(y, model(x, *popt), uncertainty=unc)
R = R_squared(y, model(popt, x), uncertainty=unc)
n, p = len(y), len(popt)
coefficient = (n - 1)/(n - p - 1)
adj = 1 - (1 - R) * coefficient
return adj, R
从 ODR 运行的输出中,您可以在 out.beta
中找到模型参数的最佳值,此时我们拥有计算 R 平方所需的一切。
from scipy import odr
def lin_model(beta, x):
"""
Linear function y = m*x + q
slope m, constant term/y-intercept q
"""
return beta[0] * x + beta[1]
linear = odr.Model(lin_model)
data = odr.RealData(x, y, sx=sigma_x, sy=sigma_y)
init = odr.ODR(data, linear, beta0=[1, 1])
out = init.run()
adjusted_Rsq, Rsq = adjusted_R(x, y, lin_model, popt=out.beta)