我想使用SI模型测量信息在我的图表上的传播。我定义了一组初始感染节点。我基于这个代码:Susceptible-Infected model for network diffusion来开发我的合适代码。但是当我在5000个节点的图形中运行我的代码时,它会在几个小时内运行。这是我的代码:
get_infected1 = function(g, transmission_rate, diffusers){
infected=list()
Susceptible<-setdiff(V(g)$name,diffusers)
toss = function(freq) {
tossing = NULL
coins = c(1, 0)
probabilities = c(transmission_rate, 1-transmission_rate )
for (i in 1:freq ) tossing[i] = sample(coins, 1, rep=TRUE, prob=probabilities)
tossing = sum(tossing)
return (tossing)
}
infected[[1]] = diffusers
update_diffusers = function(diffusers){
nearest_neighbors<-data.frame()
for (i in 1:length(diffusers)){
L<-as.character(diffusers[i])
Nei1 <- unique(neighbors(g,(V(g)$name == L),1))
Nei1<-intersect(Susceptible,Nei1)
nearest_neighbors1 = data.frame(table(unlist(Nei1)))
nearest_neighbors = unique(rbind(nearest_neighbors,nearest_neighbors1))
}
nearest_neighbors = subset(nearest_neighbors, !(nearest_neighbors[,1]%in%diffusers))
keep = unlist(lapply(nearest_neighbors[,2],toss))
new = as.numeric(as.character(nearest_neighbors[,1][keep >= 1]))
for (j in 1:length(new)){ #fill the vector
c<-new[j]
vec[j]<-V(g)$name[c]
}
new_infected = as.vector(vec)
diffusers = unique(c(diffusers, new_infected))
return(diffusers)
}
# get infected nodes
total_time = 1
node_number=vcount(g)
while(length(Susceptible) > 0){
infected[[total_time+1]] = sort(update_diffusers(infected[[total_time]]))
Susceptible<-setdiff(Susceptible, infected[[total_time+1]])
total_time = total_time + 1
}
# return the infected nodes list
return(infected)
}
初始感染节点的每个节点都有一定的概率感染他的邻居,因此输出时我们会得到每个步骤中受感染节点的列表。
我想调整此代码以在RHadoop系统上运行。但我是RHadoop的新手。我不知道我应该在哪里修改,我怎么能在hadoop上介绍我的图?请提出任何建议吗?