我正在尝试模拟R中的Chinese Restaurant process,并想知道我是否可以在此粗略实现方面提高效率。
iTables = 200 # number of tables
iSampleSize = 1000 # number of diners
# initialize the list of tables
listTableOccupants = vector('list', iTables)
for(currentDiner in seq.int(iSampleSize)) {
# occupation probabilities for the next diner
vProbabilities = sapply(listTableOccupants,
function(x) ifelse(!is.null(x),
length(x)/currentDiner,
1/currentDiner))
# pick the index of the lucky table
iTable = sample.int(iTables, size = 1, prob = vProbabilities)
# add to the list element corresponding to the table
listTableOccupants[[iTable]] =
c(listTableOccupants[[iTable]], currentDiner)
}
特别是,我担心这一行:
# add to the list element corresponding to the table
listTableOccupants[[iTable]] =
c(listTableOccupants[[iTable]], currentDiner)
效率这么高吗?
答案 0 :(得分:0)
为避免空间重新分配和稀疏数据结构,您可以改为将表标签应用于每个用餐者。例如,
nDnr <- 100 # number of diners; must be at least 2
vDnrTbl <- rep(0, nDnr) # table label for each diner
alpha <- 2 # CRP parameter
vDnrTbl[1] <- 1
for (dnr in 2:length(vDnrTbl)) {
# compute occupation probabilities for current diner
vOcc <- table(vDnrTbl[1:(dnr-1)])
vProb <- c(vOcc, alpha) / (dnr - 1 + alpha)
# add table label to diner
nTbl <- as.numeric(names(vOcc)[length(vOcc)]) # avoid overhead of finding max of possibly large vector
vDnrTbl[dnr] <- sample.int(nTbl+1, size=1, prob=vProb)
}
从vDnrTbl
,您可以获得listTableOccupants
:
nTbl <- max(c(nTbl, vDnrTbl[dnr]))
listTableOccupants <- lapply(1:nTbl, function(t) which(vDnrTbl == t))