我正在使用statsmodels.api
计算两个变量之间的OLS拟合的统计参数:
def computeStats(x, y, yName):
'''
Takes as an argument an array, and a string for the array name.
Uses Ordinary Least Squares to compute the statistical parameters for the
array against log(z), and determines the equation for the line of best fit.
Returns the results summary, residuals, statistical parameters in a list, and the
best fit equation.
'''
# Mask NaN values in both axes
mask = ~np.isnan(y) & ~np.isnan(x)
# Compute model parameters
model = sm.OLS(y, sm.add_constant(x), missing= 'drop')
results = model.fit()
residuals = results.resid
# Compute fit parameters
params = stats.linregress(x[mask], y[mask])
fit = params[0]*x + params[1]
fitEquation = '$(%s)=(%.4g \pm %.4g) \\times redshift+%.4g$'%(yName,
params[0], # slope
params[4], # stderr in slope
params[1]) # y-intercept
return results, residuals, params, fit, fitEquation
函数的第二部分(使用stats.linregress
)可以很好地与掩码值一起使用,但是statsmodels
却不能。当我尝试使用plt.scatter(x, resids)
对x值绘制残差时,尺寸不匹配:
ValueError: x and y must be the same size
因为存在29007个x值和11763个残差(这是通过掩膜过程获得的y个值)。我尝试将model
变量更改为
model = sm.OLS(y[mask], sm.add_constant(x[mask]), missing= 'drop')
但这没有效果。
如何将残差与它们匹配的x值进行散点图?
答案 0 :(得分:1)
您好@ jim421616由于statsmodels丢失了一些缺失值,因此您应该使用模型的exog变量来绘制散布图,如图所示。
plt.scatter(model.model.exog[:,1], model.resid)
作为参考,一个完整的虚拟示例
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
#generate data
x = np.random.rand(1000)
y =np.sin( x*25)+0.1*np.random.rand(1000)
# Make some as NAN
y[np.random.choice(np.arange(1000), size=100)]= np.nan
x[np.random.choice(np.arange(1000), size=80)]= np.nan
# fit model
model = sm.OLS(y, sm.add_constant(x) ,missing='drop').fit()
print model.summary()
# plot
plt.scatter(model.model.exog[:,1], model.resid)
plt.show()