Question

我使用R进行基于代理的历史模拟，代码工作起来却很慢。它循环通过时间步骤更新代理属性的数据帧，而另一个用于在每个时间步长（一代）之后总体状态的摘要。在上面循环是每个不同参数设置的几次运行。虽然它以100个代理开始，但在极端设置（高S，低A）之后，例如五代人口可以增长到一千以上。我读到更新矩阵比数据帧更快，所以我将汇总转换为矩阵。但是我也听说矢量化是最好的，所以在我将代理更改为矩阵之前我想知道是否有人可以建议一种方法来使其更加矢量化？这是代码：

NextGeneration <- function(agent, N, S, A) {
   # N is number of agents.
   # S is probability that an agent with traditional fertility will have 2 sons surviving to the age of inheritance.
   # A is probability that an heir experiencing division of estate changes his fertility preference from traditional to planned.
   # find number of surviving heirs for each agent
   excess <- runif(N)  # get random numbers 
   heir <- rep(1, N)  # everyone has at least 1 surviving heir 

   # if agent has traditional fertility 2 heirs may survive to inherit
   heir[agent$fertility == "Trad" & excess < S] <- 2  

   # next generation more numerous if spare heirs survive

   # new agents have vertical inheritance but also guided variation. 
   # first append to build a vector, then combine into new agent dataframe  
   nextgen.fertility <- NULL
   nextgen.lineage <- NULL

   for (i in 1:N) {

      if (heir[i]==2) {

         # two agents inherit from one parent.
         for (j in 1:2) {

            # A is probability of inheritance division event affecting fertility preference in new generation.
            if (A > runif(1)) {
               nextgen.fertility <- c(nextgen.fertility, "Plan") 
            } else {
               nextgen.fertility <- c(nextgen.fertility, agent$fertility[i])
            }
            nextgen.lineage <- c(nextgen.lineage, agent$lineage[i])
         }
      } else {
         nextgen.fertility <- c(nextgen.fertility, agent$fertility[i])
         nextgen.lineage <- c(nextgen.lineage, agent$lineage[i])
      }
   }
   # assemble new agent frame  
   nextgen.agent <- data.frame(nextgen.fertility, nextgen.lineage, stringsAsFactors = FALSE) 
   names(nextgen.agent) <- c("fertility", "lineage")
   nextgen.agent
}

所以代理人就这样开始（Trad =传统）：

ID      fertility   lineage,
1       Trad        1
2       Trad        2
3       Trad        3
4       Trad        4
5       Trad        5

经过几次步骤（几代）的随机变化后，结果如下：

ID      fertility   lineage
1       Plan       1
2       Plan       1
3       Trad       2
4       Plan       3
5       Trad       3
6       Trad       4
7       Plan       4
8       Plan       4
9       Plan       4
10      Plan       5
11      Trad       5

Answer 1

实际上，使用0和1对fertility进行编码会更有效，甚至可以使用整数矩阵。

无论如何，现有的代码可以简化很多 - 所以这里是一个矢量化的解决方案，仍在使用你的data.frame：

NextGen <- function(agent, N, S, A) {
  excess <- runif(N)
  v1 <- which(agent$fertility == "Trad" & excess < S)
  nextgen.agent <- agent[c(1:N, v1), ]
  nextgen.agent[c(v1, seq.int(N+1, nrow(nextgen.agent))), "fertility"] <- ifelse(A > runif(length(v1)*2), "Plan", "Trad")
  nextgen.agent
}

使用样本agent DF进行测试，如下所示：

agentDF <- data.frame(fertility = "Trad", lineage = 1:50, stringsAsFactors = FALSE)

# use microbenchmark library to compare performance
microbenchmark::microbenchmark(
  base = {
    res1 <- NextGeneration(agentDF, 50, 0.8, 0.8) # note I fixed the two variable typos in your function
  }, 
  new = {
    res2 <- NextGen(agentDF, 50, 0.8, 0.8)
  }, 
  times = 100
)

## Unit: microseconds
## expr      min        lq     mean    median       uq       max neval
## base 1998.533 2163.8605 2446.561 2222.8200 2286.844 14413.173   100
##  new  282.032  304.1165  329.552  320.3255  348.488   467.217   100

R中的嵌套循环模拟运行缓慢

1 个答案: