我仍在使用python进行编程。 这是我第一次尝试使用直方图和拟合!
特别是,我有一个数据集,并对其做了直方图。在这一点上,我应该进行瑞利拟合,但是我无法找出正确设置参数的正确方法。我读到了loc和scale,应该将fit的参数通常设置为0和1。显然,这种方式的拟合效果不好!!!有没有人可以帮助我? 为了清楚起见,我要附加我正在使用的代码。
谢谢。
pipeline.set_params(hyperparameter=1, ...)
我的数据(fondi)是:[13 15 13 14 12 13 12 14 15 12 11 10 11 15 18 11 11 11 13 15 15 15 15 12 12
13 12 15 15 15 12 12 11 14 16 11 13 14 16 17 24 21 16 20 18 18 19 21 22
19 15 16 15 13 14 16 18 21 19 22 14 13 14 15 14 17 19 17 16 18 12 15 17
17 16 17 16 19 17 14 13 16 16 13 15 17 17 20 18 17 12 19 14 15 15 14 13
17 16 14 12 11 12 20 19 16 24 19 20 19 17 16 17 16 19 22 17 16 20 22 21
22 20 14 18 16 19 20 17 20 22 20 22 19 17 13 16 18 14 16 20 20 18 19 19
16 19 12 12 14 14 13 15 16 16 19 16 17 12 11 11 10 12 11 11 13 14 13 17
8 8 8 10 10 10 14 16 11 9 9 11 10 17 13 15 19 15 13 16 17 14 12 13
14 11 10 15 13 12 12 11 10 9 9 9 9 8 15 16 12 9 11 9 10 10 7 7
7 21 19 13 10 15 12 10 10 9 8 10 20 14 13 11 13 15 14 10 11 12 16 17
15 12 13 16 15 13 14 17 14 13 15 13 11 14 15 17 18 22 22 16 16 17 22 17 17
18 26 17 19 21 16 15 19 19 22 19 18 17 18 18 12 17 17 17 18 18 14 16 20 17
16 16 18 16 19 18 18 20 20]
输出:loc = 6.783540954380711 scale = 6.430045149216335
答案 0 :(得分:3)
下面是一个简单的过程,可从Rayleigh distribution提取试验数据集,然后使用Maximum Likelihood Estimation方法提供的scipy.stats.rv_continuous.fit
查找其参数:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# Create a Continuous Variable:
X = stats.rayleigh(loc=10, scale=5)
# Sample from this Random variable
x0 = X.rvs(size=10000, random_state=123)
# Adjust Distribution parameters
loc, scale = stats.rayleigh.fit(x0) # (9.990726961181025, 4.9743913760956335)
# Tabulate over sample range (PDF display):
xl = np.linspace(x0.min(), x0.max(), 100)
# Display Results:
fig, axe = plt.subplots()
axe.hist(x0, density=1, label="Sample")
axe.plot(xl, X.pdf(xl), label="Exact Distribution")
axe.plot(xl, stats.rayleigh(scale=scale, loc=loc).pdf(xl), label="Adjusted Distribution")
axe.set_title("Distribution Fit")
axe.set_xlabel("Variable, $x$ $[\mathrm{AU}]$")
axe.set_ylabel("Density, $f(x)$ $[\mathrm{AU}^{-1}]$")
axe.legend()
axe.grid()
它呈现如下:
我想提请您注意以下要点:
matplotlib
估算垃圾箱的数量; scipy.stats
中,它们会尽一切可能以这种方式对每个可用分布进行规范化。要找出与usual parametric distribution definition的对应关系,您需要解决以下问题:pdf(x) = pdf(y)/scale
其中y = (x-loc)/scale
。在这种情况下,您将看到scale
参数等效于sigma
,并且这对于原点偏移是不变的(不依赖于loc
值); stats.rayleigh.fit(x0)
)。这部分独立于matplotlib
绘制的任何图形,由scipy
处理,该图形在完整数据集上执行MLE(这就是为什么更改bin只会影响直方图显示而没有其他影响的原因。) / li>
根据您的帖子更新,我完成了我的回答。使用您提供的数据集:
x0 = np.array([13, 15, 13, 14, 12, 13, 12, 14, 15, 12, 11, 10, 11, 15, 18, 11, 11, 11, 13,
15, 15, 15, 11, 12, 13, 12, 15, 15, 15, 12, 12, 11, 14, 16, 11, 13, 14, 16,
17, 24, 21, 16, 20, 18, 18, 19, 21, 22, 19, 15, 16, 15, 13, 14, 16, 18, 21,
19, 22, 14, 13, 14, 15, 14, 17, 19, 17, 16, 18, 12, 15, 17, 17, 16, 17, 16,
19, 17, 14, 13, 16, 16, 13, 15, 17, 17, 20, 18, 17, 12, 19, 14, 15, 15, 14,
13, 17, 16, 14, 12, 11, 12, 20, 19, 16, 24, 19, 20, 19, 17, 16, 17, 16, 19,
22, 17, 16, 20, 22, 21, 22, 20, 14, 18, 16, 19, 20, 17, 20, 22, 20, 22, 19,
17, 13, 16, 18, 14, 16, 20, 20, 18, 19, 19, 16, 19, 12, 12, 14, 14, 13, 15,
16, 16, 19, 16, 17, 12, 11, 11, 10, 12, 11, 11, 13, 14, 13, 17, 8, 8, 8, 10,
10, 10, 14, 16, 11, 9, 9, 11, 10, 17, 13, 15, 19, 15, 13, 16, 17, 14, 12, 13,
14, 11, 10, 15, 13, 12, 12, 11, 10, 9, 9, 9, 9, 8, 15, 16, 12, 9, 11, 9, 10,
10, 7, 7, 7, 21, 19, 13, 10, 15, 12, 10, 10, 9, 8, 10, 20, 14, 13, 11, 13, 15,
14, 10, 11, 12, 16, 17, 15, 12, 13, 16, 15, 13, 14, 17, 14, 13, 15, 13, 11, 14,
15, 17, 18, 22, 21, 16, 17, 22, 17, 17, 18, 26, 17, 19, 21, 16, 15, 19, 19, 22,
19, 18, 17, 18, 18, 12, 17, 17, 17, 18, 14, 16, 20, 17, 16, 16, 18, 16, 19, 18,
18, 20, 18])
我们可以尝试调整瑞利分布:
p = stats.rayleigh.fit(x0)
X = stats.rayleigh(*p)
从视觉上看,合身度不是很好:
让我们通过统计测试进行确认。首先,我们可以使用ECDF Kolmogorov-Smirnov检查Test是否与调整后的分布的CDF兼容:
kst = stats.kstest(x0, X.cdf)
# KstestResult(statistic=0.12701044409231593, pvalue=0.0001232197856051324)
我们还可以评估调整后的分布的预期计数,并使用Chi Square Test将它们与有礼貌的计数进行比较:
c, b = np.histogram(x0)
ct = np.diff(X.cdf(b))*np.sum(c)
c2t = stats.chisquare(c, ct, ddof=2)
# Power_divergenceResult(statistic=31.874916914227434, pvalue=4.284273564311872e-05)
自由度的差等于2,因为除了卡方统计量外,我们还必须估计瑞利分布的loc
和scale
参数(因此在测试中为ddof=2
致电)。
这两个测试的p-value都非常低且相似,这意味着很难满足零假设(因此告诉我们应该拒绝它们):
H0
=样本是从参考分布中提取的; H0
=类别在观察分布和预期分布之间没有差异; 那么很难相信您的数据集来自调整后的瑞利分布。
您可以将这些结果与MCVE中绘制的综合数据进行比较,测试返回的p值会超过10%:
# KstestResult(statistic=0.0097140857969642, pvalue=0.3019167138216704)
# Power_divergenceResult(statistic=11.170065854104491, pvalue=0.13137094282775724)
在这种情况下我们无法拒绝H0,我们相信采样数据可能来自调整后的瑞利分布。