我需要做大量的模拟,这需要很多时间。我认为处理时间可以通过data.table
减少。如何将mdply(data.frame(prob=seq(from = 0.1, to = 0.9, by = 0.1)), rbinom, n = 5, size = 2)
的结果存储到data.table
而不将其输出保存到data.frame
。
library(plyr)
df1 <- mdply(data.frame(prob=seq(from = 0.1, to = 0.9, by = 0.1)), rbinom, n = 5, size = 2)
library(data.table)
dt1 <- data.table(df1)
被修改
我知道我可以使用setDT(df1)
来避免创建dt1
。但是,主要问题是关于mdply
会产生data.frame
会占用大量时间。
答案 0 :(得分:3)
plyr
和data.table
用途非常相似,因此您通常不需要在两者之间来回切换。在这种情况下,您可以使用data.table
执行所有操作:
dt = data.table(prob = seq(0.1, 0.9, by = 0.1))
dt = dt[, as.list(rbinom(prob, n = 5, size = 2)), by = prob]
dt
prob V1 V2 V3 V4 V5
1: 0.1 0 0 0 0 0
2: 0.2 0 0 0 0 1
3: 0.3 1 2 1 0 1
4: 0.4 1 1 2 1 0
5: 0.5 2 2 1 1 1
6: 0.6 1 1 0 0 1
7: 0.7 2 1 2 1 0
8: 0.8 2 1 2 0 1
9: 0.9 2 2 2 2 2
我想补充一点,我的预感是,最快的方法是首先制作矩阵,然后分配列。
> mat = mapply(rbinom, prob = dt$prob, n = 5, size = 2)
> cbind(dt, t(mat))
prob V1 V2 V3 V4 V5
1: 0.1 0 0 0 0 0
2: 0.2 1 0 0 1 1
3: 0.3 1 1 1 0 0
4: 0.4 1 0 2 1 1
5: 0.5 1 1 1 0 2
6: 0.6 2 0 2 1 1
7: 0.7 1 1 1 2 1
8: 0.8 1 2 1 0 2
9: 0.9 1 1 2 1 1
对8000行表的快速测试表明这更快:
> dt = data.table(prob = (seq(0.1, 0.9, by = 0.00001)))
> system.time(for(i in 1:10) dt[, as.list(rbinom(prob, n = 5, size = 2)), by = prob])
user system elapsed
6.14 0.00 6.16
> system.time(for(i in 1:10) {mat = mapply(rbinom, prob = dt$prob, n = 5, size = 2) ; cbind(dt, t(mat))})
user system elapsed
2.61 0.00 2.62
两者都是对原版的重大改进:
> system.time(for(i in 1:10) {df1 = mdply(df, rbinom, n = 5, size = 2) ; dt1 = data.table(df1)})
user system elapsed
152.23 46.60 200.07