Question

我试图使曲线拟合matplotlib生成的直方图中的值：

n, bins, patches = plt.hist(myData)

在哪里＆＃34; plt＆＃34;代表matplotlib.pyplot，myData是一个数组，每个索引的出现次数如[9,3,3，....]

我希望bins是我的x数据，而n是我的y数据。也就是说，我想提取有关x号码与号码x的频率的信息。但是，我不能让箱子和n具有相同的尺寸。

所以基本上，我希望能够将曲线拟合到n（bins，params）。

如何做到这一点？

Answer 1

来自matplotlib.pyplot.hist的文档：

返回

n：数组或数组列表

直方图箱的值。有关可能的语义的说明，请参阅normed和weights。如果输入x是一个数组，那么这是一个长度为nbins的数组。如果输入是一个序列数组[data1, data2,..]，那么这是一个数组列表，其中每个数组的直方图值的顺序相同。

bins：array

垃圾箱的边缘。长度nbins + 1（nbins左边缘和最后一个bin的右边缘）。即使传入多个数据集，也始终是一个数组。

补丁：列表或列表列表

如果有多个输入数据集，用于创建直方图或此列表列表的各个补丁的静音列表。

正如你所看到的那样，第二次返回实际上就是垃圾箱的边缘，所以它包含的物品多于垃圾箱。

获取垃圾箱中心的最简单方法是：

import numpy as np
bin_center = bin_borders[:-1] + np.diff(bin_borders) / 2

这只是将两个边框（框的宽度）之间的宽度的一半（np.diff）添加到左边框边框。排除最后一个bin边框，因为它是最右边bin的右边框。

所以这实际上会返回bin中心 - 一个与n长度相同的数组。

请注意，如果您有numba，则可以加快边框到中心的计算速度：

import numba as nb

@nb.njit
def centers_from_borders_numba(b):
    centers = np.empty(b.size - 1, np.float64)
    for idx in range(b.size - 1):
        centers[idx] = b[idx] + (b[idx+1] - b[idx]) / 2
    return centers

def centers_from_borders(borders):
    return borders[:-1] + np.diff(borders) / 2

速度要快得多：

bins = np.random.random(100000)
bins.sort()

# Make sure they are identical
np.testing.assert_array_equal(centers_from_borders_numba(bins), centers_from_borders(bins))

# Compare the timings
%timeit centers_from_borders_numba(bins)
# 36.9 µs ± 275 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit centers_from_borders(bins)
# 150 µs ± 704 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

即使它是更快的numba是一个非常重的依赖，你不要轻易添加。然而，玩起来很快很有趣，但在下面我将使用NumPy版本，因为它对大多数未来的访问者会更有帮助。

关于将函数拟合到直方图的一般任务：您需要定义一个适合数据的函数，然后您可以使用scipy.optimize.curve_fit。例如，如果您想要拟合高斯曲线：

import numpy as np
import matplotlib.pyplot as plt

from scipy.optimize import curve_fit

然后定义拟合的函数和一些样本数据集。样本数据集仅用于此问题的目的，您应该使用数据集并定义您想要适合的函数：

def gaussian(x, mean, amplitude, standard_deviation):
    return amplitude * np.exp( - ((x - mean) / standard_deviation) ** 2)

x = np.random.normal(10, 5, size=10000)

拟合曲线并绘制曲线：

bin_heights, bin_borders, _ = plt.hist(x, bins='auto', label='histogram')
bin_centers = bin_borders[:-1] + np.diff(bin_borders) / 2
popt, _ = curve_fit(gaussian, bin_centers, bin_heights, p0=[1., 0., 1.])

x_interval_for_fit = np.linspace(bin_borders[0], bin_borders[-1], 10000)
plt.plot(x_interval_for_fit, gaussian(x_interval_for_fit, *popt), label='fit')
plt.legend()

请注意，您也可以使用NumPys histogram和Matplotlibs bar-plot代替。区别在于np.histogram不返回“patches”数组，并且您需要Matplotlibs条形图的bin宽度：

bin_heights, bin_borders = np.histogram(x, bins='auto')
bin_widths = np.diff(bin_borders)
bin_centers = bin_borders[:-1] + bin_widths / 2
popt, _ = curve_fit(gaussian, bin_centers, bin_heights, p0=[1., 0., 1.])

x_interval_for_fit = np.linspace(bin_borders[0], bin_borders[-1], 10000)

plt.bar(bin_centers, bin_heights, width=bin_widths, label='histogram')
plt.plot(x_interval_for_fit, gaussian(x_interval_for_fit, *popt), label='fit', c='red')
plt.legend()

当然，您也可以在直方图中使用其他功能。我通常喜欢Astropy s models for fitting，因为您不需要自己创建函数，它也支持复合模型和不同的拟合。

例如，使用Astropy将高斯曲线拟合到数据集：

from astropy.modeling import models, fitting

bin_heights, bin_borders = np.histogram(x, bins='auto')
bin_widths = np.diff(bin_borders)
bin_centers = bin_borders[:-1] + bin_widths / 2

t_init = models.Gaussian1D()
fit_t = fitting.LevMarLSQFitter()
t = fit_t(t_init, bin_centers, bin_heights)

x_interval_for_fit = np.linspace(bin_borders[0], bin_borders[-1], 10000)
plt.figure()
plt.bar(bin_centers, bin_heights, width=bin_widths, label='histogram')
plt.plot(x_interval_for_fit, t(x_interval_for_fit), label='fit', c='red')
plt.legend()

只需更换：

即可为数据拟合不同的模型

t_init = models.Gaussian1D()

使用不同的型号。例如Lorentz1D（如高斯但有更宽的尾巴）：

t_init = models.Lorentz1D()

根据我的样本数据，这不是一个好的模型，但如果已经有一个符合需求的Astropy模型，它真的很容易使用。

Answer 2

一种实现方法是使用与直方图相同的参数绘制曲线的PDF或PMF。例如，如果您认为要检查直方图如何拟合正态分布，则可以使用与直方图相同的均值和方差来绘制正态PDF。使用numpy.random.normal()函数创建随机法线并绘制其PDF。在Numpy官方页面上说明了这样做的方法：https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html

在Python中将曲线拟合到直方图

2 个答案:

返回

n：数组或数组列表

bins：array

补丁：列表或列表列表