使用pandas dataframe和numpy进行多项式拟合

时间:2018-12-13 21:45:33

标签: python pandas numpy statsmodels data-fitting

我有一个脚本,其中包含一个数据帧,看起来像这样:

enter image description here

并将某些列转换为numpy数组进行处理。然后,我使用我编写的一个小函数,该函数使用statsmodels.api根据传入函数的两个数组来计算线性回归。然后该函数返回统计信息和线性拟合方程:

def computeLinearStats(x, y, yName, calc_tau = False):
    '''
    Takes as an argument two numpy arrays, one for x and one y, and a string for the
    name of the y-variable, and a boolean for whether to calculate tau.
    Uses Ordinary Least Squares to compute the statistical parameters for the
    array against log(z), and determines the equation for the line of best fit.
    Returns the results summary, residuals, statistical parameters in a list,
    the best fit equation, and Kendall's tau.
    '''

    #   Mask NaN values in both axes
    mask = ~np.isnan(y) & ~np.isnan(x)
    #   Compute model parameters
    model = sm.OLS(y[mask], sm.add_constant(x[mask]), missing= 'drop')
    results = model.fit()
    residuals = results.resid
    if calc_tau:
        tau = stats.kendalltau(x, y, nan_policy= 'omit')
    else:
        tau = [1, 1]    #   Use this to exclude computation of tau
#    

    #   Compute fit parameters
    params = stats.linregress(x[mask], y[mask])
    fit = params[0]*x + params[1]
    fitEquation = '$(%s)=(%.4g \pm %.4g) \\times log_{10}(redshift)+%.4g$'%(yName,
                    params[0],  #   slope
                    params[4],  #   stderr in slope
                    params[1])  #   y-intercept
    return results, residuals, params, fit, fitEquation, tau

例如,假设我正在寻找loz(z)与数据帧中“ B-I”之间的线性拟合。计算完这些变量后,我会打电话给

results, residuals, params, fit, equation, tau = qf.computeLinearStats(log_z, (B-I), 'B-I', calc_tau = False)

获得线性拟合。

一切正常,但现在我需要拟合多项式而不是线性拟合。

我尝试过

sources['log_z'] = np.log10(sources.z)
mask = ~np.isnan((B-I)) & ~np.isnan(log_z)
model = ols(formula='(B-I) ~ log_z', data = [log_z[mask], (B-I) 
[mask]]).fit()

model = ols(formula='(B-I) + np.power((U-R),2) ~ log_z', data = [log_z[mask], (B-I)[mask]]).fit()

但是我明白了

PatsyError: Error evaluating factor: TypeError: list indices must be integers or slices, not str
    (B-I) ~ log_z
            ^^^^^

即使x和y都是数组,而不是字符串。

在这种情况下找到多项式拟合的最简单方法是什么?例如,像(B-I) + (U-R)**2log(z)这样的东西? this site上的幻灯片41及更高版本似乎是一个起点,但是我对如何应用它感到困惑。

0 个答案:

没有答案