我开发了一个模拟,根据输入参数的值,它可能非常慢(最多6个小时)。
分析我的代码会发现sample()是瓶颈,但据我所知,没有更好的函数(sample()已经针对最大速度进行了优化,因为它是用C语言编写的。)
由于算法迭代彼此依赖,因此并行化也不起作用。
有关改善运行时间的替代方案的任何建议吗?
热烈欢迎任何建议。
以下是我的模拟代码:
HAC.sim <- function(K = 1, N, Hstar, probs, m = 0, perms = 10000, p = 0.95){
pop <- array(dim = c(c(perms, ceiling((1 - m) * N / K)), K))
haps <- as.character(1:Hstar)
specs <- 1:ceiling((1 - m) * N / K)
for (j in 1:perms){
for (i in 1:K){
pop[j, specs, i] <- sample(haps, size = length(specs), replace = TRUE, prob = probs)
}
}
HAC.mat <- array(dim = c(c(perms, length(specs), K)))
for (k in specs){
for (j in 1:perms){
for (i in 1:K){
ind.index <- sample(specs, size = k, replace = FALSE)
hap.plot <- pop[sample(1:nrow(pop), size = 1, replace = TRUE), ind.index, sample(i, size = 1, replace = TRUE)]
HAC.mat[j, k, i] <- length(unique(hap.plot))
}
}
}
means <- apply(HAC.mat, MARGIN = 2, mean)
lower <- apply(HAC.mat, MARGIN = 2, function(x) quantile(x, 0.025))
upper <- apply(HAC.mat, MARGIN = 2, function(x) quantile(x, 0.975))
d <- assign("d", data.frame(specs, means), envir = .GlobalEnv)
P <- max(means)
Q <- Hstar - max(means)
R <- assign("R", max(means) / Hstar, envir = .GlobalEnv)
S <- (Hstar - max(means)) / Hstar
Nstar <- assign("Nstar", (N * Hstar) / max(means), envir = .GlobalEnv)
X <- ((N * Hstar) / max(means)) - N
cat("\n Measures of Sampling Closeness \n \n Mean number of haplotypes sampled: " , P, "\n Mean number of haplotypes not sampled: " , Q, "\n Proportion of haplotypes sampled: " , R, "\n Proportion of haplotypes not sampled: " , S, "\n \n Calculated mean value of N*: ", Nstar, "\n Mean number of individuals not sampled: ", X, "\n \n")
if (R < p){
cat("Desired level of H* has not yet been reached \n")
} else{
cat("Desired level of H* has been reached")
}
par(mfrow = c(1, 2))
plot(specs, means, type = "n", xlab = "Specimens sampled", ylab = "Unique haplotypes", ylim = c(1, Hstar))
polygon(x = c(specs, rev(specs)), y = c(lower, rev(upper)), col = "gray")
lines(specs, means, lwd = 2)
HAC.bar <- barplot(length(specs) * probs, xlab = "Unique haplotypes", ylab = "Specimens sampled", names.arg = 1:Hstar)
}
N <- 50
Hstar <- 5
probs <- rep(1/Hstar, Hstar)
HAC.sim(N = N, Hstar = Hstar, probs = probs)
答案 0 :(得分:0)
当然,Rcpp
是一个选项,但需要您用C ++重写部分代码。一个非侵入性选项是使用compiler
包,可以轻松地提高速度:
#old R version of lapply
slow_func <- function(X, FUN, ...) {
FUN <- match.fun(FUN)
if (!is.list(X))
X <- as.list(X)
rval <- vector("list", length(X))
for(i in seq(along = X))
rval[i] <- list(FUN(X[[i]], ...))
names(rval) <- names(X) # keep `names' !
return(rval)
}
# Compiled versions
require(compiler)
slow_func_compiled <- cmpfun(slow_func)
您可以阅读更多相关信息here。但是,如果唯一的问题是sample
,则Rcpp
似乎有alternative implemention。看起来它在所有情况下都不会更快。