将模拟结果存储为R中的data.table

时间:2015-07-31 17:07:37

标签: r data.table plyr

我需要做大量的模拟,这需要很多时间。我认为处理时间可以通过data.table减少。如何将mdply(data.frame(prob=seq(from = 0.1, to = 0.9, by = 0.1)), rbinom, n = 5, size = 2)的结果存储到data.table而不将其输出保存到data.frame

library(plyr)
df1 <- mdply(data.frame(prob=seq(from = 0.1, to = 0.9, by = 0.1)), rbinom, n = 5, size = 2)
library(data.table)
dt1 <- data.table(df1)

被修改

我知道我可以使用setDT(df1)来避免创建dt1。但是,主要问题是关于mdply会产生data.frame会占用大量时间。

1 个答案:

答案 0 :(得分:3)

plyrdata.table用途非常相似,因此您通常不需要在两者之间来回切换。在这种情况下,您可以使用data.table执行所有操作:

dt = data.table(prob = seq(0.1, 0.9, by = 0.1))
dt = dt[, as.list(rbinom(prob, n = 5, size = 2)), by = prob]
dt
   prob V1 V2 V3 V4 V5
1:  0.1  0  0  0  0  0
2:  0.2  0  0  0  0  1
3:  0.3  1  2  1  0  1
4:  0.4  1  1  2  1  0
5:  0.5  2  2  1  1  1
6:  0.6  1  1  0  0  1
7:  0.7  2  1  2  1  0
8:  0.8  2  1  2  0  1
9:  0.9  2  2  2  2  2

我想补充一点,我的预感是,最快的方法是首先制作矩阵,然后分配列。

> mat = mapply(rbinom, prob = dt$prob, n = 5, size = 2)
> cbind(dt, t(mat))
   prob V1 V2 V3 V4 V5
1:  0.1  0  0  0  0  0
2:  0.2  1  0  0  1  1
3:  0.3  1  1  1  0  0
4:  0.4  1  0  2  1  1
5:  0.5  1  1  1  0  2
6:  0.6  2  0  2  1  1
7:  0.7  1  1  1  2  1
8:  0.8  1  2  1  0  2
9:  0.9  1  1  2  1  1

对8000行表的快速测试表明这更快:

> dt = data.table(prob = (seq(0.1, 0.9, by = 0.00001)))
> system.time(for(i in 1:10) dt[, as.list(rbinom(prob, n = 5, size = 2)), by = prob])
   user  system elapsed 
   6.14    0.00    6.16 
> system.time(for(i in 1:10) {mat = mapply(rbinom, prob = dt$prob, n = 5, size = 2) ; cbind(dt, t(mat))})
   user  system elapsed 
   2.61    0.00    2.62 

两者都是对原版的重大改进:

> system.time(for(i in 1:10) {df1 = mdply(df, rbinom, n = 5, size = 2) ; dt1 = data.table(df1)})
   user  system elapsed 
 152.23   46.60  200.07