假设最适合一组点的线是：

Question

下面是我的文本文件中散点绘图数据的代码。我打开的文件包含两列。左列是x坐标，右列是y坐标。代码创建x与y的散点图。我需要一个代码来覆盖散点图中数据的最佳拟合线，并且没有内置的pylab函数对我有用。

from matplotlib import *
from pylab import *

with open('file.txt') as f:
   data = [line.split() for line in f.readlines()]
   out = [(float(x), float(y)) for x, y in data]
for i in out:
   scatter(i[0],i[1])
   xlabel('X')
   ylabel('Y')
   title('My Title')
show()

Answer 1

用于绘制最佳拟合线的this excellent answer的单行版本是：

plt.plot(np.unique(x), np.poly1d(np.polyfit(x, y, 1))(np.unique(x)))

使用np.unique(x)代替x来处理x未排序或重复值的情况。

Answer 2

假设最适合一组点的线是：

y = a + b * x

哪里：

b = ( sum(xi * yi) - n * xbar * ybar ) / sum((xi - xbar)^2)
a = ybar - b * xbar

代码和图

# sample points 
X = [0, 5, 10, 15, 20]
Y = [0, 7, 10, 13, 20]

# solve for a and b
def best_fit(X, Y):

    xbar = sum(X)/len(X)
    ybar = sum(Y)/len(Y)
    n = len(X) # or len(Y)

    numer = sum([xi*yi for xi,yi in zip(X, Y)]) - n * xbar * ybar
    denum = sum([xi**2 for xi in X]) - n * xbar**2

    b = numer / denum
    a = ybar - b * xbar

    print('best fit line:\ny = {:.2f} + {:.2f}x'.format(a, b))

    return a, b

# solution
a, b = best_fit(X, Y)
#best fit line:
#y = 0.80 + 0.92x

# plot points and fit line
import matplotlib.pyplot as plt
plt.scatter(X, Y)
yfit = [a + b * xi for xi in X]
plt.plot(X, yfit)

更新

notebook version

Answer 3

你可以使用numpy的polyfit。我使用以下内容（您可以安全地删除关于确定系数和误差范围的位，我认为它看起来不错）：

#!/usr/bin/python3

import numpy as np
import matplotlib.pyplot as plt
import csv

with open("example.csv", "r") as f:
    data = [row for row in csv.reader(f)]
    xd = [float(row[0]) for row in data]
    yd = [float(row[1]) for row in data]

# sort the data
reorder = sorted(range(len(xd)), key = lambda ii: xd[ii])
xd = [xd[ii] for ii in reorder]
yd = [yd[ii] for ii in reorder]

# make the scatter plot
plt.scatter(xd, yd, s=30, alpha=0.15, marker='o')

# determine best fit line
par = np.polyfit(xd, yd, 1, full=True)

slope=par[0][0]
intercept=par[0][1]
xl = [min(xd), max(xd)]
yl = [slope*xx + intercept  for xx in xl]

# coefficient of determination, plot text
variance = np.var(yd)
residuals = np.var([(slope*xx + intercept - yy)  for xx,yy in zip(xd,yd)])
Rsqr = np.round(1-residuals/variance, decimals=2)
plt.text(.9*max(xd)+.1*min(xd),.9*max(yd)+.1*min(yd),'$R^2 = %0.2f$'% Rsqr, fontsize=30)

plt.xlabel("X Description")
plt.ylabel("Y Description")

# error bounds
yerr = [abs(slope*xx + intercept - yy)  for xx,yy in zip(xd,yd)]
par = np.polyfit(xd, yerr, 2, full=True)

yerrUpper = [(xx*slope+intercept)+(par[0][0]*xx**2 + par[0][1]*xx + par[0][2]) for xx,yy in zip(xd,yd)]
yerrLower = [(xx*slope+intercept)-(par[0][0]*xx**2 + par[0][1]*xx + par[0][2]) for xx,yy in zip(xd,yd)]

plt.plot(xl, yl, '-r')
plt.plot(xd, yerrLower, '--r')
plt.plot(xd, yerrUpper, '--r')
plt.show()

Answer 4

已经实施了@Micah的解决方案来生成一个有一些变化的趋势线，并且认为我分享了：

编码为函数
多项式趋势线的选项（输入order=2）
功能也可以返回确定系数（R ^ 2，输入Rval=True）
更多Numpy数组优化

代码：

def trendline(xd, yd, order=1, c='r', alpha=1, Rval=False):
    """Make a line of best fit"""

    #Calculate trendline
    coeffs = np.polyfit(xd, yd, order)

    intercept = coeffs[-1]
    slope = coeffs[-2]
    power = coeffs[0] if order == 2 else 0

    minxd = np.min(xd)
    maxxd = np.max(xd)

    xl = np.array([minxd, maxxd])
    yl = power * xl ** 2 + slope * xl + intercept

    #Plot trendline
    plt.plot(xl, yl, c, alpha=alpha)

    #Calculate R Squared
    p = np.poly1d(coeffs)

    ybar = np.sum(yd) / len(yd)
    ssreg = np.sum((p(xd) - ybar) ** 2)
    sstot = np.sum((yd - ybar) ** 2)
    Rsqr = ssreg / sstot

    if not Rval:
        #Plot R^2 value
        plt.text(0.8 * maxxd + 0.2 * minxd, 0.8 * np.max(yd) + 0.2 * np.min(yd),
                 '$R^2 = %0.2f$' % Rsqr)
    else:
        #Return the R^2 value:
        return Rsqr

Answer 5

from sklearn.linear_model import LinearRegression

X, Y = x.reshape(-1,1), y.reshape(-1,1)
plt.plot( X, LinearRegression().fit(X, Y).predict(X) )

代码为python中散点图的最佳拟合直线

5 个答案:

假设最适合一组点的线是：

代码和图

更新