Rhadoop中的SI模型

时间:2017-03-01 10:05:54

标签: r graph igraph rhadoop

我想使用SI模型测量信息在我的图表上的传播。我定义了一组初始感染节点。我基于这个代码:Susceptible-Infected model for network diffusion来开发我的合适代码。但是当我在5000个节点的图形中运行我的代码时,它会在几个小时内运行。这是我的代码:

get_infected1 = function(g, transmission_rate, diffusers){
infected=list()
Susceptible<-setdiff(V(g)$name,diffusers) 
 toss = function(freq) {
    tossing = NULL
    coins = c(1, 0)
    probabilities = c(transmission_rate, 1-transmission_rate )
    for (i in 1:freq ) tossing[i] = sample(coins, 1, rep=TRUE, prob=probabilities)
    tossing = sum(tossing)
    return (tossing)
  }

infected[[1]] = diffusers

  update_diffusers = function(diffusers){
nearest_neighbors<-data.frame()
    for (i in 1:length(diffusers)){

        L<-as.character(diffusers[i])
        Nei1 <- unique(neighbors(g,(V(g)$name == L),1))
            Nei1<-intersect(Susceptible,Nei1)
        nearest_neighbors1 = data.frame(table(unlist(Nei1)))
        nearest_neighbors = unique(rbind(nearest_neighbors,nearest_neighbors1))
                                   }
    nearest_neighbors = subset(nearest_neighbors, !(nearest_neighbors[,1]%in%diffusers))
    keep = unlist(lapply(nearest_neighbors[,2],toss))
    new = as.numeric(as.character(nearest_neighbors[,1][keep >= 1]))
      for (j in 1:length(new)){ #fill the vector
         c<-new[j]
             vec[j]<-V(g)$name[c] 
         }

    new_infected = as.vector(vec)
    diffusers = unique(c(diffusers, new_infected))
    return(diffusers)
  }

  # get infected nodes
  total_time = 1
  node_number=vcount(g)
  while(length(Susceptible) > 0){
    infected[[total_time+1]] = sort(update_diffusers(infected[[total_time]]))
    Susceptible<-setdiff(Susceptible, infected[[total_time+1]])
    total_time = total_time + 1
    }
  # return the infected nodes list
  return(infected)
}

初始感染节点的每个节点都有一定的概率感染他的邻居,因此输出时我们会得到每个步骤中受感染节点的列表。

我想调整此代码以在RHadoop系统上运行。但我是RHadoop的新手。我不知道我应该在哪里修改,我怎么能在hadoop上介绍我的图?请提出任何建议吗?

0 个答案:

没有答案