Question

我在R中运行离散事件模拟。＆＃34; heart＆＃34;我的算法执行以下操作（伪代码）：

1）迭代events

a）根据event[i]
更改resources
b）根据步骤a）的结果改变resources

以下可重复的示例涵盖了主要方面：

生成一些数据：

set.seed(4)
n <- 3
nr_resources <- 2

events <- data.frame(
  t = as.integer(trunc(cumsum(rexp(n)))),
  resource = NA,
  worktime = as.integer(trunc(runif(n)*10))
)

resources <- data.frame(
  id = 1:nr_resources,
  t_free = 0L
)
events
resources

# > events
# t resource worktime
# 0       NA        2
# 4       NA        8
# 5       NA        2
# > resources
# id t_free
#  1      0
#  2      0

现在我们可以模拟资源的调度：

for (i in 1:n) {
  events$resource[i] <- resources$id[resources$t_free <= events$t[i]][1]
  resources$t_free[events$resource[i]] <- events$t[i] + events$worktime[i]
}

events
resources

# > events
# t resource worktime
# 0        1        2
# 4        1        8
# 5        2        2
# > resources
# id t_free
#  1     12
#  2      7

这种方法运行良好，但我想消除一些缺点。由于events和resources分为两个数据集，因此在两个数据集中进行了相当多的子集化（搜索和替换）。这不是真的可读。在实际应用中，它甚至成为性能瓶颈。（..当然，真实的例子相当复杂..）

因此我问自己是否有更好的方法可以在R中完成这项任务。

我考虑过用普通的高阶函数替换for循环，但没有得到任何结果。

典型的R lapply方法不起作用，因为lapply不是为输入数据中的这种迭代更改而构建的。（据我所见..）
我的任务看起来有点像Reduce模式。由于Reduce(sum, 1:3, accumulate = TRUE)使用中间结果并保留它们，我认为我可以使用Reduce函数，但没有取得任何结果。

我还考虑过重组我的数据，但直到现在都没有成功。

我详细尝试了什么

在算法方面：

使用lapply：

进行失败

l <- list(events = events, resources = resources)
l <- lapply(l, function(x) {
  l$events$resource <- l$resources$id[l$resources$t_free <= l$events$t][1]
  l$resources$t_free[l$events$resource] <- l$events$t + l$events$worktime
  return(l)
})

l$events
l$resources

结果变为：

# $events
# t resource worktime
# 1 0        1        2
# 2 4        1        8
# 3 5        1        2
# 
# $resources
# id t_free
# 1  1      7
# 2  2      0

资源的中间更改将丢失，因此资源1始终被预订。

使用Reduce：

进行失败

l <- list(events = events, resources = resources)
l <- Reduce(function(l) {
  l$events$resource <- l$resources$id[l$resources$t_free <= l$events$t][1]
  l$resources$t_free[l$events$resource] <- l$events$t + l$events$worktime
  return(l)}, l, accumulate = TRUE)

失败

f（init，x [[i]]）出错：未使用的参数（x [[i]]）

在数据方面：

我能想到的另一种方法是更改要在一个数据集中表示的数据。例如，通过将事件乘以资源数量。我尝试了以下方法：

data <- merge(events, resources)
data <- data[order(data$t), ]
data

# t resource worktime id t_free
# 0       NA        2  1      0
# 0       NA        2  2      0
# 4       NA        8  1      0
# 4       NA        8  2      0
# 5       NA        2  1      0
# 5       NA        2  2      0

for (i in seq_along(data)) {
  if ( is.na(data$resource[i])) {
    data$resource[data$t == data$t[i]] <- data$id[data$t_free <= data$t[i]][1]
    data$t_free[data$id == data$resource[i]] <- data$t[i] + data$worktime[i]
  }
}

data
# t resource worktime id t_free
# 0        1        2  1     12
# 0        1        2  2      7
# 4        1        8  1     12
# 4        1        8  2      7
# 5        2        2  1     12
# 5        2        2  2      7

events <- unique(data[,1:3])
events
# t resource worktime
# 0        1        2
# 4        1        8
# 5        2        2

resources <- unique(data[,4:5])
resources
# id t_free
#  1     12
#  2      7

这也有效，但我不确定如果缩放，是否会带来更好的性能，可读性和可变性..

所以我的问题是：

算法方面或数据方面是否有其他方法可以改善我的实际解决方案？

Answer 1

老实说，我更喜欢你的第一个for-loop，你应该考虑使用像Rcpp::sourceCpp这样的东西，并将你的逻辑迁移到C ++。我认为这应该是可读和更快的。如果你必须在R，这是一种可能性：

t_free <- Reduce(x = 1L:n,
                 init = rep(0L, nr_resources),
                 accumulate = TRUE,
                 f = function(t_free, i) {
                   # which.max will return the location of the first TRUE
                   id <- which.max(t_free <= events$t[i])
                   # R makes a local copy of t_free here
                   t_free[id] <- events$t[i] + events$worktime[i]
                   # return the chosen resource for this "iteration"
                   attr(t_free, "resource") <- id
                   # return the modified copy
                   t_free
                 })

# events$resource column by extracting the resource attribute, igonring init
events$resource <- sapply(t_free[-1L], attr, "resource")
# your resources$t_free column in the last element
resources <- data.frame(id = 1L:nr_resources,
                        t_free = t_free[[n + 1L]])

for循环和交叉数据集子集的替代方法。（..使用更高阶函数或替代数据结构）

我详细尝试了什么

1 个答案:

for循环和交叉数据集子集的替代方法。 （..使用更高阶函数或替代数据结构）

我详细尝试了什么

1 个答案:

for循环和交叉数据集子集的替代方法。（..使用更高阶函数或替代数据结构）