Question

在scipy中，负二项分布定义为：

nbinom.pmf(k) = choose(k+n-1, n-1) * p**n * (1-p)**k

这是常见的定义，另见维基百科： https://en.wikipedia.org/wiki/Negative_binomial_distribution

然而，存在不同的参数化，其中负二项式由平均值mu和色散参数定义。

在R中这很容易，因为可以通过两个参数化来定义negbin：

dnbinom(x, size, prob, mu, log = FALSE)

如何在scipy中使用均值/色散参数化？

编辑：

直接来自R帮助：

尺寸= n且prob = p的负二项分布具有密度

Γ（x + n）/（Γ（n）x！）p ^ n（1-p）^ x

替代参数化（通常用于生态学）是通过平均mu（参见上文）和大小，分散参数，其中prob = size /（size + mu）。在此参数化中，方差为mu + mu ^ 2 / size。

这里也有更详细的描述：

https://en.wikipedia.org/wiki/Negative_binomial_distribution#Alternative_formulations

Answer 1

from scipy.stats import nbinom


def convert_params(mu, theta):
    """
    Convert mean/dispersion parameterization of a negative binomial to the ones scipy supports

    See https://en.wikipedia.org/wiki/Negative_binomial_distribution#Alternative_formulations
    """
    r = theta
    var = mu + 1 / r * mu ** 2
    p = (var - mu) / var
    return r, 1 - p


def pmf(counts, mu, theta):
    """
    >>> import numpy as np
    >>> from scipy.stats import poisson
    >>> np.isclose(pmf(10, 10, 10000), poisson.pmf(10, 10), atol=1e-3)
    True
    """
    return nbinom.pmf(counts, *convert_params(mu, theta))


def logpmf(counts, mu, theta):
    return nbinom.logpmf(counts, *convert_params(mu, theta))


def cdf(counts, mu, theta):
    return nbinom.cdf(counts, *convert_params(mu, theta))


def sf(counts, mu, theta):
    return nbinom.sf(counts, *convert_params(mu, theta))

Answer 2

您链接的维基百科页面根据mu和sigma给出了p和r的精确公式，请参阅备选参数化部分中的最后一个项目，https://en.m.wikipedia.org/wiki/Negative_binomial_distribution#Alternative_formulations

scipy中负二项式的替代参数化

2 个答案: