曲线拟合

Question

Temp    k(T)
298   6.66E-63
300   1.48E-62
350   3.58E-55
400   1.25E-49
450   2.57E-45
500   7.30E-42
550   4.90E-39
600   1.12E-36
650   1.11E-34
700   5.72E-33
750   1.75E-31
800   3.49E-30
850   4.92E-29
900   5.17E-28
950   4.24E-25
1000  2.83E-26

以上是给定的动力学数据，我试图拟合这些数据并绘制相同的图表。

曲线拟合

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import pandas as pd
plt.style.use('ggplot')

#Generate data
df=pd.read_excel('py_curvefit.xlsx')
T=df.Temp  #xdata
def reacKine(T,A,n,Ea):
    return A*((T/298)**n)*np.exp(-Ea/(0.008314*T))
kt=df['k(T)']  #ydata
#rectifying an erroneous value      
kt[14]=4.24*10**(-27)  
popt,pcov=curve_fit(reacKine,T,kt)
A,n,Ea=popt
plt.plot(T,np.log(kt),'g-',label='given data')
plt.plot(T,np.log(reacKine(T,*popt)),'ro',label='fit')
plt.xlabel('Temperature [K]')
plt.ylabel('log of reaction coefficient')
plt.legend(loc='best')
plt.show()

它表示找不到该功能的最佳参数。我该如何纠正这个问题。我希望看到一个合适的人选。是因为指数期限吗？

Answer 1

这是一个敏感问题（通常涉及指数时）。对于这样的问题，重要的是对参数进行非常好的初始猜测。

如果您试验参数，您会发现A必须非常小。 curve_fit用于所有参数的默认初始猜测为1，而1对于A来说太大了。如果我使用1e-10作为A

的初始猜测

popt, pcov = curve_fit(reacKine, T, kt, p0=(1e-10, 1, 1))

我从curve_fit收到以下错误：

RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 800.

因此，让我们maxfev增加2000：

popt, pcov = curve_fit(reacKine, T, kt, p0=(1e-10, 1, 1), maxfev=2000)

我得到了同样的错误。当我将其增加到100000时，函数成功。

这是一个脚本，其中包含对curve_fit的更新调用，后跟脚本生成的图。

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt


T = np.array([298, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800,
              850, 900, 950, 1000])
kt = np.array([6.66e-63, 1.48e-62, 3.58e-55, 1.25e-49, 2.57e-45, 7.30e-42,
               4.90e-39, 1.12e-36, 1.11e-34, 5.72e-33, 1.75e-31, 3.49e-30,
               4.92e-29, 5.17e-28, 4.24e-27, 2.83e-26])

def reacKine(T,A,n,Ea):
    return A*((T/298)**n)*np.exp(-Ea/(0.008314*T))

popt, pcov = curve_fit(reacKine, T, kt, p0=(1e-10, 1, 1), maxfev=100000)


plt.plot(T, kt, '.', label='data')
tt = np.linspace(T[0], T[-1], 160)
kk = reacKine(tt, *popt)
semilogy = True
if semilogy:
    plt.semilogy(tt, kk, 'k-', alpha=0.3, label='fit')
    results_xy = (700, 1e-45)
else:
    plt.plot(tt, kk, 'k-', alpha=0.3, label='fit')
    results_xy = (300, 1.5e-26)

plt.annotate(xy=results_xy,
            s=('Fit Results:\n  $A\,$  = %.4g\n  $n\,$  = %.4g\n  $E_{a}$ = %.4g' %
               tuple(popt)))
plt.xlabel('T')
plt.ylabel('k(T)')
plt.legend(framealpha=1, shadow=True)
plt.show()

P.S。 @MNewville可能会建议使用lmfit建议更好的方法。

Answer 2

我使用下面的代码和pyeq3拟合库得到以下参数和拟合统计数据：

Fitting target of sum of squared absolute error = 7.93711173898e-62
Fitted Parameters:
    A = 3.6814349968228987E-12
    Ea = 2.8663497636217801E+02
    n = 1.6329619761384757E+00

Degress of freedom error 13
Degress of freedom regression 2
Root Mean Squared Error (RMSE): 7.04322002841e-32
R-squared: 0.9999999999
R-squared adjusted: 0.999999999884
Model F-statistic: 64790385432.5
Model F-statistic p-value: 1.11022302463e-16
Model log-likelihood: 1124.98750379
Model AIC: -140.248437973
Model BIC: -140.103577588

Individual Parameter Statistics:
Coefficient A = 3.6814349968228987E-12
    std error: 1.67464E-25
    t-stat: 8.99615E+00
    p-stat: 6.05074E-07
    95 percent confidence intervals: [2.79736E-12, 4.56551E-12]
Coefficient Ea = 2.8663497636217801E+02
    std error: 1.69556E-01
    t-stat: 6.96102E+02
    p-stat: 0.00000E+00
    95 percent confidence intervals: [2.85745E+02, 2.87525E+02]
Coefficient n = 1.6329619761384757E+00
    std error: 2.59159E-03
    t-stat: 3.20770E+01
    p-stat: 9.19265E-14
    95 percent confidence intervals: [1.52298E+00, 1.74294E+00]

Coefficient Covariance Matrix:
    [ 2.74285036e+37   2.75990923e+49  -3.41210380e+48]
    [ 2.75990923e+49   2.77711499e+61  -3.43328442e+60]
    [-3.41210380e+48  -3.43328442e+60   4.24469499e+59]

import os, sys, inspect
import pyeq3

functionString = 'A*((X/298)**n)*exp(-Ea/(0.008314*X))'

data = '''
298   6.66e-63
300   1.48e-62
350   3.58e-55
400   1.25e-49
450   2.57e-45
500   7.30e-42
550   4.90e-39
600   1.12e-36
650   1.11e-34
700   5.72e-33
750   1.75e-31
800   3.49e-30
850   4.92e-29
900   5.17e-28
950   4.24e-27
1000  2.83e-26
'''

# note that the constructor is passed the function string here
equation = pyeq3.Models_2D.UserDefinedFunction.UserDefinedFunction(inUserFunctionString = functionString)

pyeq3.dataConvertorService().ConvertAndSortColumnarASCII(data, equation, False)

equation.Solve()

##########################################################

print("Equation:", equation.GetDisplayName(), str(equation.GetDimensionality()) + "D")
print("Fitting target of", equation.fittingTargetDictionary[equation.fittingTarget], '=', equation.CalculateAllDataFittingTarget(equation.solvedCoefficients))
print("Fitted Parameters:")
for i in range(len(equation.solvedCoefficients)):
    print("    %s = %-.16E" % (equation.GetCoefficientDesignators()[i], equation.solvedCoefficients[i]))


equation.CalculateModelErrors(equation.solvedCoefficients, equation.dataCache.allDataCacheDictionary)
print()

##########################################################

equation.CalculateCoefficientAndFitStatistics()

if equation.upperCoefficientBounds or equation.lowerCoefficientBounds:
    print('You entered coefficient bounds. Parameter statistics may')
    print('not be valid for parameter values at or near the bounds.')
    print()

print('Degress of freedom error',  equation.df_e)
print('Degress of freedom regression',  equation.df_r)

if equation.rmse == None:
    print('Root Mean Squared Error (RMSE): n/a')
else:
    print('Root Mean Squared Error (RMSE):',  equation.rmse)

if equation.r2 == None:
    print('R-squared: n/a')
else:
    print('R-squared:',  equation.r2)

if equation.r2adj == None:
    print('R-squared adjusted: n/a')
else:
    print('R-squared adjusted:',  equation.r2adj)

if equation.Fstat == None:
    print('Model F-statistic: n/a')
else:
    print('Model F-statistic:',  equation.Fstat)

if equation.Fpv == None:
    print('Model F-statistic p-value: n/a')
else:
    print('Model F-statistic p-value:',  equation.Fpv)

if equation.ll == None:
    print('Model log-likelihood: n/a')
else:
    print('Model log-likelihood:',  equation.ll)

if equation.aic == None:
    print('Model AIC: n/a')
else:
    print('Model AIC:',  equation.aic)

if equation.bic == None:
    print('Model BIC: n/a')
else:
    print('Model BIC:',  equation.bic)


print()
print("Individual Parameter Statistics:")
for i in range(len(equation.solvedCoefficients)):
    if type(equation.tstat_beta) == type(None):
        tstat = 'n/a'
    else:
        tstat = '%-.5E' %  ( equation.tstat_beta[i])

    if type(equation.pstat_beta) == type(None):
        pstat = 'n/a'
    else:
        pstat = '%-.5E' %  ( equation.pstat_beta[i])

    if type(equation.sd_beta) != type(None):
        print("Coefficient %s = %-.16E, std error: %-.5E" % (equation.GetCoefficientDesignators()[i], equation.solvedCoefficients[i], equation.sd_beta[i]))
    else:
        print("Coefficient %s = %-.16E, std error: n/a" % (equation.GetCoefficientDesignators()[i], equation.solvedCoefficients[i]))
    print("          t-stat: %s, p-stat: %s, 95 percent confidence intervals: [%-.5E, %-.5E]" % (tstat,  pstat, equation.ci[i][0], equation.ci[i][1]))

print()
print("Coefficient Covariance Matrix:")
for i in  equation.cov_beta:
    print(i)

Answer 3

（强制？）lmfit回答：

您可能会发现lmfit很有用。对于这个问题的框架方式，它不会增加太多，但为曲线拟合和拟合参数提供了更好的抽象。与@ WarrenWeskesser的答案类似，它看起来像

import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model

T = np.array([298, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800,
          850, 900, 950, 1000])
kt = np.array([6.66e-63, 1.48e-62, 3.58e-55, 1.25e-49, 2.57e-45, 7.30e-42,
           4.90e-39, 1.12e-36, 1.11e-34, 5.72e-33, 1.75e-31, 3.49e-30,
           4.92e-29, 5.17e-28, 4.24e-27, 2.83e-26])

def reacKine(T, A, n, Ea):
    return A*((T/298)**n)*np.exp(-Ea/(0.008314*T))

react_model = Model(reacKine)
params = react_model.make_params(A=2.e-11, n=1, Ea=200)
result = react_model.fit(kt, params, T=T)

print(result.fit_report())

plt.plot(T, kt, 'bo', label='data')
plt.plot(T, result.best_fit, 'r--', label='fit')

plt.xlabel('T (K)')
plt.ylabel('k(T)')
plt.legend()
plt.gca().set_yscale('log')
plt.show()

当使用Python27和scipy 1.0.0时，这将显示类似于Warrens的拟合（我遗漏了注释），并打印出适合的报告

[[Model]]
    Model(reacKine)
[[Fit Statistics]]
    # function evals   = 1294
    # data points      = 16
    # variables        = 3
    chi-square         = 0.000
    reduced chi-square = 0.000
    Akaike info crit   = -2219.907
    Bayesian info crit = -2217.590
[[Variables]]
    A:    1.3365e-10 +/- 5.06e-12 (3.79%) (init= 2e-11)
    n:   -0.02392420 +/- 0.034279 (143.28%) (init= 1)
    Ea:   299.843529 +/- 0.024996 (0.01%) (init= 200)
[[Correlations]] (unreported correlations are <  0.100)
    C(A, n)                      = -0.997 
    C(A, Ea)                     =  0.117

当适合Python36和scipy 1.0.0时，报告将是

[[Model]]
    Model(reacKine)
[[Fit Statistics]]
    # function evals   = 1618
    # data points      = 16
    # variables        = 3
    chi-square         = 0.000
    reduced chi-square = 0.000
    Akaike info crit   = -2289.381
    Bayesian info crit = -2287.063
[[Variables]]
    A:    3.6814e-12 +/- 4.09e-13 (11.12%) (init= 2e-11)
    n:    1.63296239 +/- 0.050923 (3.12%) (init= 1)
    Ea:   286.634973 +/- 0.411890 (0.14%) (init= 200)
[[Correlations]] (unreported correlations are <  0.100)
    C(A, n)                      = -1.000 
    C(A, Ea)                     =  1.000 
    C(n, Ea)                     = -1.000

这些价值观与沃伦和詹姆斯所展示的一致。

我没有很好的解释为什么结果与Python版本不同，特别是为什么相关性是＆gt;适用于Python36版本中所有变量的0.999。但是，由于参数几乎完全相关，并且与数据点相比有很多拟合评估，如果存在错误的最小值和复杂的相关空间，我不会感到惊讶。

Python不准确的曲线拟合

曲线拟合

3 个答案: