Question

from numpy import *
import matplotlib.pyplot as plt
import numpy as np

# This is my data set
x = [15, 30, 45, 60, 75, 90, 105, 120, 135, 150, 165, 180, 195, 210, 225, 240]
y = [1, 0.9, 0.8, 0.7, 0.6, 0.55, 0.5, 0.45, 0.4, 0.35, 0.33, 0.31, 0.29, 0.27, 0.25, 0.23]

我想在此数据集中添加3个线性回归。通过使用pyplot绘制我的数据集，我可以直观地看到扭结开始形成的位置（大约x = 105，x = 165）。所以我可以创建3个线性回归（从x到0,105到165,165到240）。但是我如何科学地做到这一点？换句话说，我想在我的数据中添加3个线性回归，以最小化卡方。有没有办法用代码完成这个？

Answer 1

您可以在下面找到使用scipy.stats.linregress的自动程序的代码和输出;解释可以在代码下面找到。输出如下：

斜坡和拦截条款是：

曲线1：-0.0066 * x + 1.10
曲线2：-0.0033 * x + 0.85
曲线3：-0.0013 * x + 0.55

以下是代码：

from scipy import stats
import matplotlib.pyplot as plt
import numpy as np

x = np.array([15, 30, 45, 60, 75, 90, 105, 120, 135, 150, 165, 180, 195, 210, 225, 240])
y = np.array([1, 0.9, 0.8, 0.7, 0.6, 0.55, 0.5, 0.45, 0.4, 0.35, 0.33, 0.31, 0.29, 0.27, 0.25, 0.23])

# get slope of your data
dif = np.diff(y) / np.diff(x)

# determine the change of the slope
difdif = np.diff(dif)

# define a threshold for the allowed change of the slope
threshold = 0.001

# get indices where the diff returns value larger than a threshold
indNZ = np.where(abs(difdif) > threshold)[0]

# this makes plotting easier and avoids a couple of if clauses
indNZ += 1
indNZ = np.append(indNZ, len(x))
indNZ = np.insert(indNZ, 0, 0)

# plot the data
plt.scatter(x, y)

for indi, ind in enumerate(indNZ):

    if ind < len(x):
        slope, intercept, r_value, p_value, std_err = stats.linregress(x[ind:indNZ[indi+1]], y[ind:indNZ[indi+1]])
        plt.plot(x[ind:indNZ[indi+1]], slope * x[ind:indNZ[indi+1]] + intercept)

plt.show()

首先，可以使用np.diff计算斜率。将np.diff应用于斜率可以得到斜率发生显着变化的点;在上面的代码中，我使用了一个阈值（如果你总是处理完美的行，那么可以设置一个非常小的值;如果你有嘈杂的数据，你将不得不调整这个值。）

如果斜率变化显着的指数，则可以在相应的部分进行线性回归并相应地绘制结果。

for循环更详细：

indNZ

是

array([ 0,  4,  9, 16])

它为您提供三行的间隔。因此，蓝线对应于x[0]和x[3]的部分，绿线指向x[4]至x[8]的部分，红线指向{{1的部分到x[9]。在for循环中，选择这些范围，使用x[15]进行线性拟合（如果您更喜欢，也可以用scipy.stats.linregress替换），然后使用等式{绘制线条{1}}。

如何最小化3个线性拟合的平方

1 个答案: