散点图中曲线的编码?

时间:2018-04-06 08:22:40

标签: regression linear scatter

任何人都知道如何在python中编码此[散点图] [1]中的趋势线?数据由风速和台风距离组成。然后如何生成线性回归方程,如图所示。

该图以及线性回归方程是从Microsoft Excel生成的。

我用matplotlib.pyplot尝试了这个,但我得到的是直线而不是曲线。

[1]: https://i.stack.imgur.com/NMTzP.png

1 个答案:

答案 0 :(得分:1)

这是一个使用scipy曲线拟合原始数据的示例,使用95%置信区间绘制拟合函数,最后从拟合中打印参数值和R平方值。请注意,此示例使用curve_fit()的默认初始参数估计值,这些在此示例中有效但有时不是最佳的 - 如果您对数据非常适合,则默认值为OK。

import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
import scipy.stats
from scipy.optimize import curve_fit


def func(x, scale, offset):
    return (scale * numpy.log(x)) + offset # numpy.log() is natural log


X = numpy.array([5.35, 5.45, 5.79, 5.93, 6.16, 6.70, 6.73, 6.78, 8.44, 9.77, 9.86])
Y = numpy.array([0.37, 0.48, 0.87, 1.04, 1.32, 2.05, 2.07, 2.13, 4.74, 7.06, 7.10])


print("Fitting data...")
# using curve_fit() default initial parameters
params, covariance = curve_fit(func, X, Y)

absErr = Y - func(X, *params)
Rsquared = 1.0 - (absErr.var() / Y.var())

print('Fitted parameters:', params)
print('R-squared:', Rsquared)

##########################################################
# graphics output section
def ModelScatterConfidenceGraph(X, Y, func, graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(X, Y,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(X), max(X))
    yModel = func(xModel, *params)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    # now calculate confidence intervals
    # http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_nlin_sect026.htm
    # http://www.staff.ncl.ac.uk/tom.holderness/software/pythonlinearfit
    mean_x = numpy.mean(X)
    n = len(X)
    df_e = n - len(params)
    sse = numpy.sum(numpy.square(Y - func(X, *params)))

    t_value = scipy.stats.t.ppf(0.975, df_e) # (1.0 - (a/2)) is used for two-sided t-test critical value, here a = 0.05

    confs = t_value * numpy.sqrt((sse/df_e)*(1.0/n + (numpy.power((xModel-mean_x),2.0)/
                                                                                       ((numpy.sum(numpy.power(X,2.0)))-n*(numpy.power(mean_x,2.0))))))

    # get lower and upper confidence limits based on predicted y and confidence intervals
    upper = yModel + abs(confs)
    lower = yModel - abs(confs)

    # mask off any numbers outside the existing plot limits
    booleanMask = yModel > axes.get_ylim()[0]
    booleanMask &= (yModel < axes.get_ylim()[1])

    # color scheme improves visibility on black background lines or points
    axes.plot(xModel[booleanMask], lower[booleanMask], linestyle='solid', color='white')
    axes.plot(xModel[booleanMask], upper[booleanMask], linestyle='solid', color='white')
    axes.plot(xModel[booleanMask], lower[booleanMask], linestyle='dashed', color='blue')
    axes.plot(xModel[booleanMask], upper[booleanMask], linestyle='dashed', color='blue')

    axes.set_title('Model With 95% Confidence Intervals') # add a title
    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Y Data') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot


graphWidth = 800
graphHeight = 600
ModelScatterConfidenceGraph(X, Y, func, graphWidth, graphHeight)