Question

所以我将一些数据存储为两个列表，并使用

绘制它们

plot(datasetx, datasety)

然后我设置了趋势线

trend = polyfit(datasetx, datasety)
trendx = []
trendy = []

for a in range(datasetx[0], (datasetx[-1]+1)):
    trendx.append(a)
    trendy.append(trend[0]*a**2 + trend[1]*a + trend[2])

plot(trendx, trendy)

但是我有第三个数据列表，这是原始数据集中的错误。我很好地绘制了错误栏，但我不知道的是如何使用它，如何在多项式趋势线的系数中找到错误。

所以说我的趋势线是5x ^ 2 + 3x + 4 = y，5,3和4值需要出现某种错误。

是否有使用NumPy的工具可以为我计算？

Answer 1

我认为您可以使用curve_fit（documentation）的 scipy.optimize 功能。用法的基本示例：

import numpy as np
from scipy.optimize import curve_fit

def func(x, a, b, c):
    return a*x**2 + b*x + c

x = np.linspace(0,4,50)
y = func(x, 5, 3, 4)
yn = y + 0.2*np.random.normal(size=len(x))

popt, pcov = curve_fit(func, x, yn)

根据文档，pcov给出：

popt估计的协方差。对角线提供了变化参数估计。

因此，您可以通过这种方式计算系数的误差估计值。要获得标准偏差，您可以采用方差的平方根。

现在你的系数有误，但它只是基于ydata和拟合之间的偏差。如果您还想在ydata本身上考虑错误，curve_fit函数会提供sigma参数：

sigma：无或N长度序列

如果不是None，则表示ydata的标准偏差。这个   如果给定，则向量将用作最小二乘中的权重   问题

一个完整的例子：

import numpy as np
from scipy.optimize import curve_fit

def func(x, a, b, c):
    return a*x**2 + b*x + c

x = np.linspace(0,4,20)
y = func(x, 5, 3, 4)
# generate noisy ydata
yn = y + 0.2 * y * np.random.normal(size=len(x))
# generate error on ydata
y_sigma = 0.2 * y * np.random.normal(size=len(x))

popt, pcov = curve_fit(func, x, yn, sigma = y_sigma)

# plot
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(111)
ax.errorbar(x, yn, yerr = y_sigma, fmt = 'o')
ax.plot(x, np.polyval(popt, x), '-')
ax.text(0.5, 100, r"a = {0:.3f} +/- {1:.3f}".format(popt[0], pcov[0,0]**0.5))
ax.text(0.5, 90, r"b = {0:.3f} +/- {1:.3f}".format(popt[1], pcov[1,1]**0.5))
ax.text(0.5, 80, r"c = {0:.3f} +/- {1:.3f}".format(popt[2], pcov[2,2]**0.5))
ax.grid()
plt.show()

result

关于使用numpy数组的

然后是其他东西。使用numpy的一个主要优点是可以避免for循环，因为数组上的操作应用于elementwise。因此，您的示例中的for循环也可以按如下方式完成：

trendx = arange(datasetx[0], (datasetx[-1]+1))
trendy = trend[0]*trendx**2 + trend[1]*trendx + trend[2]

我使用arange而不是范围，因为它返回一个numpy数组而不是一个列表。在这种情况下，您还可以使用numpy函数polyval：

trendy = polyval(trend, trendx)

Answer 2

我无法找到任何方法来获取内置于numpy或python的系数中的错误。我有一个简单的工具，我根据John Taylor的错误分析简介的第8.5和8.6节写的。也许这对你的任务来说已经足够了（注意默认的回报是方差，而不是标准偏差）。由于显着的协方差，您可能会遇到大的错误（如在提供的示例中那样）。

def leastSquares(xMat, yMat):
'''
Purpose
-------
Perform least squares using the procedure outlined in 8.5 and 8.6 of Taylor, solving
matrix equation X a = Y

Examples
--------
>>> from scipy import matrix
>>> xMat = matrix([[  1,   5,  25],
                   [  1,   7,  49],
                   [  1,   9,  81],
                   [  1,  11, 121]])
>>> # matrix has rows of format [constant, x, x^2]
>>> yMat = matrix([[142],
                   [168],
                   [211],
                   [251]])
>>> a, varCoef, yRes = leastSquares(xMat, yMat)
>>> # a is a column matrix, holding the three coefficients a, b, c, corresponding to
>>> # the equation a + b*x + c*x^2

Returns
-------
a: matrix
    best fit coefficients
varCoef: matrix
    variance of derived coefficents
yRes: matrix
    y-residuals of fit 
'''
xMatSize = xMat.shape
numMeas = xMatSize[0]
numVars = xMatSize[1]

xxMat = xMat.T * xMat
xyMat = xMat.T * yMat
xxMatI = xxMat.I

aMat = xxMatI * xyMat
yAvgMat = xMat * aMat
yRes = yMat - yAvgMat

var = (yRes.T * yRes) / (numMeas - numVars)
varCoef = xxMatI.diagonal() * var[0, 0]

return aMat, varCoef, yRes

Python - 计算有错误的趋势线

2 个答案: