Question

如何使用R生成Lomax随机（Paretto Type II）数字？

如果，U∈[0,1）是均匀分布的随机变量，则

L（xm，α）= P（xm，α）−xm

生成Lomax分布随机变量。

Answer 1

作为使用VGAM::rlomax的替代方法，使用Lomax编写赢了的inverse transform sampling随机数生成器并不难。

带有shape和scale参数的Lomax分布的cdf由F(x) = 1 - (1 + x / scale)^(-alpha)给出。我们需要做的就是为F(F^(-1)(x)) = x解决F^(-1)(x)，其中x ~ Unif(0, 1)。

使用该解决方案，我们可以定义以下函数来绘制Lomax随机样本

rlomax.its <- function(N, scale, shape) {
    scale * ((1 - runif(N)) ^ (-1/shape) - 1)
}

我们现在从N = 1e5和scale = 1的Lomax分布中抽取shape = 2个样本，并与从VGAM::rlomax抽取的样本进行比较

library(VGAM);
N <- 1e5;
set.seed(2017);
x.VGAM <- rlomax(N, scale = 1, shape3.q = 2)
x.ITS <- rlomax.its(N, scale = 1, shape = 2)

summary(x.VGAM);
#Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
#0.0000   0.1536   0.4143   0.9985   1.0006 925.0784

summary(x.ITS);
#Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
#0.0000   0.1548   0.4158   1.0016   1.0086 280.3248

让我们使用两种方法比较不同样本量的密度图。

set.seed(2017);
bind_rows(map(
    setNames(2:5, paste0("N=10^", 2:5)),
    ~list(ITS = rlomax.its(10^(.x), 1, 2), VGAM = rlomax(10^(.x), 1, 2))),
    .id = "N") %>%
    gather(key, value, -N) %>%
    ggplot(aes(log10(value), fill = key)) +
    geom_density(alpha = 0.4) +
    facet_wrap(~ N)

显然，N越大，两种方法的分布收敛。

关于哪种方法更快，我们可以根据两种方法中的microbenchmark个Lomax样本快速运行N=1e6

library(microbenchmark);
res <- microbenchmark(
    ITS = rlomax.its(1e6, 1, 2),
    VGAM = rlomax(1e6, 1, 2))
#Unit: milliseconds
# expr       min        lq      mean    median        uq      max neval cld
#  ITS  79.22709  84.11703  88.48358  86.29181  91.07074 109.3536   100  a
# VGAM 159.56578 175.88731 218.92212 183.09769 222.64697 359.9311   100   b

library(tidyverse)
autoplot(res)

让我们看一下运行时与绘制样本的函数的相关性

library(tidyverse);
library(ggthemes);
res <- map_df(seq(2, 6, length.out = 20), function(x)
    cbind(x = 10^(x), microbenchmark(
        ITS = rlomax.its(10^(x), 1, 2),
        VGAM = rlomax(10^(x), 1, 2))))
res %>%
    mutate(N = factor(as.numeric(factor(x)))) %>%
    ggplot(aes(x = N, y = log10(time), colour = expr)) +
    geom_tufteboxplot(outlier.colour="transparent") +
    theme_minimal() +
    scale_x_discrete(
        breaks = c(1, 5, 10, 15, 20),
        labels = paste0("10^", 2:6))

我没有时间进一步探讨这一问题，但事实证明，平均而言，逆采样方法要快一些（但始终如一）。

在R中生成Lomax随机数

1 个答案: