Question

我想出了一个在我的数据上拟合幂律曲线的问题。我有两个数据集：bins1和bins2

bins1使用numpy.linalg.lstsq在曲线拟合中表现良好（然后我使用np.exp(coefs[0])*x**coefs[1]得到幂律方程式）

另一方面，bins2表现得很奇怪并且表现出一个糟糕的R平方

这两个数据的方程都不同于excel给我的方程（更差的R平方）。

这是代码（和数据）：

import numpy as np
import matplotlib.pyplot as plt
bins1 = np.array([[6.769318871738219667e-03,
             1.306418618130891773e-02,
             1.912138120913448383e-02,
             2.545189874466026111e-02,
             3.214689891729670401e-02,
             4.101898933375244805e-02,
             5.129862592803200588e-02,
             6.636505322669797313e-02,
             8.409809827572585494e-02,
             1.058164348650862258e-01,
             1.375849753230810046e-01,
             1.830664031837437311e-01,
             2.682454535427478137e-01,
             3.912508246490400410e-01,
             5.893271848997768680e-01,
             8.480213305038615257e-01,
             2.408136266017391058e+00,
             3.629192766488219313e+00,
             4.639246557509275171e+00,
             9.901792214343277720e+00],
             [8.501658465758301112e-04,
              1.562697718429977012e-03,
              1.902062808421856087e-04,
              4.411817741488644959e-03,
              3.409236963162485048e-03,
              1.686099657013027898e-03,
              3.643231240239608402e-03,
              2.544120616413291154e-04,
              2.549036204611017029e-02,
              3.527340723977697573e-02,
              5.038482027310990652e-02,
              5.617932487522721979e-02,
              1.620407270423956103e-01,
              1.906538999080910068e-01,
              3.180688368126549093e-01,
              2.364903188268162038e-01,
              3.267322385964683273e-01,
              9.384571074801122403e-01,
              4.419747716107813029e-01,
              9.254710022316929852e+00]]).T
bins2 = np.array([[6.522512685133712192e-03,
              1.300415548684437199e-02,
              1.888928895701269539e-02,
              2.509905819337970856e-02,
              3.239654633369139919e-02,
              4.130706234846069635e-02,
              5.123820846515786398e-02,
              6.444380072984744190e-02,
              8.235238352205621892e-02,
              1.070907072127811749e-01,
              1.403438221033725120e-01,
              1.863115065963684147e-01,
              2.670209758710758163e-01,
              4.003337413814173074e-01,
              6.549054078382223754e-01,
              1.116611087124244062e+00,
              2.438604844718367914e+00,
              3.480674117919704269e+00,
              4.410201659398489404e+00,
              6.401903059926267403e+00],
             [1.793454543936148608e-03,
              2.441092334386309615e-03,
              2.754373929745804715e-03,
              1.182752729942167062e-03,
              1.357797177773524414e-03,
              6.711673916715021199e-03,
              1.392761674092503343e-02,
              1.127957613093066511e-02,
              7.928803089359596004e-03,
              2.524609593305639915e-02,
              5.698702885370290905e-02,
              8.607729156137132465e-02,
              2.453761830112021203e-01,
              9.734443815196883176e-02,
              1.487480479168299119e-01,
              9.918002699934079791e-01,
              1.121298151253063535e+00,
              1.389239135742518227e+00,
              4.254082922056571237e-01,
              2.643453492951096440e+00]]).T

bins = bins1 #change to bins2 to see results for bins2

def fit(x,a,m): # power-law fit (based on previous studies)
    return a*(x**m)

coefs= np.linalg.lstsq(np.vstack([np.ones(len(bins[:,0])), np.log(bins[:,0]), bins[:,0]]).T, np.log(bins[:,1]))[0] # calculating fitting coefficients (a,m)
y_predict = fit(bins[:,0],np.exp(coefs[0]),coefs[1]) # prediction based of fitted model
model_plot = plt.loglog(bins[:,0],bins[:,1],'o',label="error")
fit_line = plt.plot(bins[:,0],y_predict,'r', label="fit")
plt.ylabel('Y (bins[:,1])')
plt.xlabel('X (bins[:,0])')
plt.title('model')
plt.legend(loc='best')
plt.show(model_plot,fit_line)

def R_sqr (y,y_predict): # calculating R squared value to measure fitting accuracy
    rsdl = y - y_predict
    ss_res = np.sum(rsdl**2)
    ss_tot = np.sum((y-np.mean(y))**2)
    R2 = 1-(ss_res/ss_tot)
    R2 = np.around(R2,decimals=4)
    return R2

R2= R_sqr(bins[:,1],y_predict)
print ('(R^2 = %s)' % (R2))

bins1 [[x]，[y]]的拟合公式：python: y = 0.337*(x)^1.223 (R^2 = 0.7773), excel: y = 0.289*(x)^1.174 (R^2 = 0.8548)

bins2 [[x]，[y]]的拟合公式：python: y = 0.509*(x)^1.332 (R^2 = -1.753), excel: y = 0.311*(x)^1.174 (R^2 = 0.9116)

这些是30个中的两个样本数据集，我在我的数据中随机看到这个拟合问题，有些在R-squared周围＆＃34; -150＆＃34; !!
Itried scipy＆＃34; curve_fit＆＃34;但是我没有得到更好的结果，实际上更糟糕了！

任何人都知道如何在python中获得类似excel的功能？

Answer 1

您正在尝试使用尚未转换为对数空间的Y来计算R平方。以下更改给出了合理的R平方值：

R2 = R_sqr(np.log(bins[:,1]), np.log(y_predict))

幂律曲线拟合scipy，numpy不工作

1 个答案: