我想运行离散时间模拟(下面是简化版)。我生成了一个人口成员数据框(每行一个成员),其中包含进入和退出网站的时间戳。然后,我希望在每个时间间隔计算网站上有多少成员。
目前,我正在循环计算时间,并在每一秒计算已进入但尚未退出的成员数量。 (我还尝试通过在每个时间间隔删除退出的成员进行破坏性迭代,这需要更长时间。我也理解我可以在循环中使用更大的时间间隔。)
如何使用线性代数消除for循环和超额运行时?我目前的方法在人口增长时不能很好地扩展,当然它在持续时间方面是线性的。
popSize = 10000
simDuration = 10000
enterTimestamp <- rexp(n = popSize, rate = .001)
exitTimestamp <- enterTimestamp + rexp(n = popSize, rate = .001)
popEvents <- data.frame(cbind(enterTimestamp,exitTimestamp))
visitorLoad <- integer(length = simDuration)
for (i in 1:simDuration) {
visitorLoad[i] <- sum(popEvents$enterTimestamp <= i &
popEvents$exitTimestamp > i)
if (i %% 100 == 0) {print(paste('Sim at',i,'out of',simDuration,
'seconds.',sep=' ') )}
}
plot(visitorLoad, typ = 'l', ylab = 'Visitor Load', xlab='Time Elapsed (sec)')
答案 0 :(得分:4)
您可以获取在不同时间进入和退出的访客数量,然后使用累计金额计算特定时间的访客数量。这似乎符合您对代码运行的要求,尽管它不使用线性代数。
diffs = rep(0, simDuration+1)
# Store the number of times a visitor enters and exits at each timestep. The table
# will contain headers that are the timesteps and values that are the number of
# people entering or exiting at the timestep.
tabEnter = table(pmax(1, ceiling(enterTimestamp)))
tabExit = table(pmin(simDuration+1, ceiling(exitTimestamp)))
# For each time index, add the number of people entering and subtract the number of
# people exiting. For instance, if in period 20, 3 people entered and 4 exited, then
# diffs[20] equals -1. as.numeric(names(tabEnter)) is the periods for which at least
# one person entered, and tabEnter is the number of people in each of those periods.
diffs[as.numeric(names(tabEnter))] = diffs[as.numeric(names(tabEnter))] + tabEnter
diffs[as.numeric(names(tabExit))] = diffs[as.numeric(names(tabExit))] - tabExit
# cumsum() sums the diffs vector through a particular time point.
visitorLoad2 = head(cumsum(diffs), simDuration)
答案 1 :(得分:1)
为简单起见,这是怎么回事:
vl<-unlist(lapply(1:simDuration,function(i)sum((enterTimestamp<=i)*(exitTimestamp>i))))
plot(vl, typ = 'l', ylab = 'Visitor Load', xlab='Time Elapsed (sec)')
它的速度是当前循环速度的两倍,但如果性能更重要,那么@josilber的解决方案更好,或者可能有data.table()
的东西,会有一个想法......
编辑 - 速度如何:
require(data.table)
require(plyr) # for count() function
system.time({
enter<-data.table(count(ceiling(enterTimestamp))) # entries grouped by second
exit<-data.table(count(ceiling(exitTimestamp))) # exits grouped by second
sim<-data.table(x=1:simDuration) # time index
merged<-merge(merge(sim,enter,by="x",all.x=T),exit,by="x",all.x=T)
mat<-data.matrix(merged[,list(freq.x,freq.y)]) # make matrix to remove NAs
mat[is.na(mat)]<-0 # remove NAs, there are quicker ways but more complicated
vl<-cumsum(mat[,1]-mat[,2]) # cumsum() to roll up the movements
})
user system elapsed
0.02 0.00 0.02
plot(vl, typ = 'l', ylab = 'Visitor Load', xlab='Time Elapsed (sec)')
**进一步编辑** - 性能和简单性的平衡
system.time(cumsum(data.frame(table(cut(enterTimestamp,0:10000))-table(cut(exitTimestamp,0:10000)))[,2]))
user system elapsed
0.09 0.00 0.10