Question

我正在尝试获取scipy.stats.probplot来绘制具有自定义发行版的QQplot。基本上，我有一堆数字变量（所有numpy数组），我想用QQplot检查分布差异。

我的数据框df看起来像这样：

         some_var  another_var
1        16.5704   3.3620
2        12.8373  -8.2204
3        8.1854    1.9617
4        13.5683   1.8376
5        8.5143    2.3173
6        6.0123   -7.7536
7        9.6775   -4.3874
...      ...       ...
189499   11.8561  -8.4887
189500   10.0422  -4.6228

根据reference：

dist ： str或stats.distributions实例，可选

分发或分发功能名称。对于正常概率图，默认值为“范数”。看起来足够像stats.distributions实例（即它们具有ppf方法）的对象也将被接受。

当然，一个numpy数组没有ppf方法，因此当我尝试以下操作时：

import scipy.stats as stats
stats.probplot(X[X.columns[1]].values, dist=X[X.columns[2]].values, plot=pylab)

我收到以下错误：

AttributeError: 'numpy.ndarray' object has no attribute 'ppf'

（注：如果我不使用.values方法，我将得到相同的错误，但对于“系列”对象而不是“ numpy.ndarry”）

因此，问题是：什么是带有ppf方法的对象，以及如何从numpy数组中创建它？

Answer 1

“ dist”对象应该是scipy统计分布的实例或类。那是什么意思：

dist：str或stats.distributions实例，可选

一个独立的例子是：

import numpy
from matplotlib import pyplot
from scipy import stats

random_beta = numpy.random.beta(0.3, 2, size=37)

fig, ax = pyplot.subplots(figsize=(6, 3))

_ = stats.probplot(
    random_beta,       # data
    sparams=(0.3, 2),  # guesses at the distribution's parameters
    dist=stats.beta,   # the "dist" object
    plot=ax            # where the data should be plotted
)

您会得到：

如果要绘制数据帧的多列，则需要多次调用probplot，每次绘制在相同（或新）轴上。

在这种简单情况下，probscale软件包的功能不多。但是，如果这是您将来可能要走的方向，那么使用概率标度而不是分位数标度可能会更灵活：

import probscale

fig, ax = pyplot.subplots(figsize=(6, 3))
fig = probscale.probplot(
    random_beta,
    ax=ax,
    plottype='qq',
    bestfit=True,
    dist=stats.beta(0.3, 2)
)

scipy.stats.probplot使用自定义发行版生成qqplot

1 个答案: