Question

我想使用mutate来计算使用二项分布的列。

我有以下示例：

library("dplyr")

d = data.frame(ref = rbinom(100,100,0.5))
d$coverage = 100
d$prob = 0.5
d$eprob= d$ref / d$coverage
d = tbl_df(d)

mutate(d,
       ref1= ref,
       cov1 = coverage,
       eprob1 = eprob,
       ref2=rbinom(1, coverage, eprob),
       ref3=rbinom(1, cov1, eprob1)
       )

结果如下：

Source: local data frame [100 x 9]

   ref coverage prob eprob ref1 cov1 eprob1 ref2 ref3
1   52      100  0.5  0.52   52  100   0.52   45   44
2   50      100  0.5  0.50   50  100   0.50   45   44
3   45      100  0.5  0.45   45  100   0.45   45   44
4   45      100  0.5  0.45   45  100   0.45   45   44
5   47      100  0.5  0.47   47  100   0.47   45   44
6   46      100  0.5  0.46   46  100   0.46   45   44
7   50      100  0.5  0.50   50  100   0.50   45   44
8   53      100  0.5  0.53   53  100   0.53   45   44
9   44      100  0.5  0.44   44  100   0.44   45   44
10  56      100  0.5  0.56   56  100   0.56   45   44

我不明白 - 我希望mutate函数返回从ref和coverage（“ref2”）给出的二项分布中抽取的随机数...

Mutate正确读取列 - 但是在调用rbinom时发生了一些奇怪的事情......

我感激的任何帮助。

Answer 1

尝试更改n的{{1}}：

rbinom

或更一般地说：

mutate(d,
   ref1= ref,
   cov1 = coverage,
   eprob1 = eprob,
   ref2=rbinom(100, coverage, eprob),
   ref3=rbinom(100, cov1, eprob1)
)

Answer 2

另一种解决方案是：

d %>% rowwise() %>%
      mutate(ref1= ref,
             cov1 = coverage,
             eprob1 = eprob,
             ref2=rbinom(1, coverage, eprob),
             ref3=rbinom(1, cov1, eprob1))

rowwise()命令按（每个）行分组，并指定每行需要1个随机值。

使用rbinom的dplyr mutate不返回随机数

2 个答案: