Question

我正在使用statsmodels.api计算两个变量之间的OLS拟合的统计参数：

def computeStats(x, y, yName):
    '''
    Takes as an argument an array, and a string for the array name.
    Uses Ordinary Least Squares to compute the statistical parameters for the
    array against log(z), and determines the equation for the line of best fit.
    Returns the results summary, residuals, statistical parameters in a list, and the 
    best fit equation.
    '''

    #   Mask NaN values in both axes
    mask = ~np.isnan(y) & ~np.isnan(x)
    #   Compute model parameters
    model = sm.OLS(y, sm.add_constant(x), missing= 'drop')
    results = model.fit()
    residuals = results.resid

    #   Compute fit parameters
    params = stats.linregress(x[mask], y[mask])
    fit = params[0]*x + params[1]
    fitEquation = '$(%s)=(%.4g \pm %.4g) \\times redshift+%.4g$'%(yName,
                    params[0],  #   slope
                    params[4],  #   stderr in slope
                    params[1])  #   y-intercept
    return results, residuals, params, fit, fitEquation

函数的第二部分（使用stats.linregress）可以很好地与掩码值一起使用，但是statsmodels却不能。当我尝试使用plt.scatter(x, resids)对x值绘制残差时，尺寸不匹配：

ValueError: x and y must be the same size

因为存在29007个x值和11763个残差（这是通过掩膜过程获得的y个值）。我尝试将model变量更改为

model = sm.OLS(y[mask], sm.add_constant(x[mask]), missing= 'drop')

但这没有效果。

如何将残差与它们匹配的x值进行散点图？

Answer 1

您好@ jim421616由于statsmodels丢失了一些缺失值，因此您应该使用模型的exog变量来绘制散布图，如图所示。

plt.scatter(model.model.exog[:,1], model.resid)

作为参考，一个完整的虚拟示例

import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt

#generate data
x = np.random.rand(1000)
y =np.sin( x*25)+0.1*np.random.rand(1000)

# Make some as NAN
y[np.random.choice(np.arange(1000), size=100)]= np.nan
x[np.random.choice(np.arange(1000), size=80)]= np.nan


# fit model
model = sm.OLS(y, sm.add_constant(x) ,missing='drop').fit()
print model.summary()

# plot 
plt.scatter(model.model.exog[:,1], model.resid)
plt.show()

用statsmodels绘制掩盖值的残差

1 个答案: