Spark graphX内存不足优化

时间:2018-11-27 09:48:10

标签: java scala apache-spark spark-graphx

我尝试使用graphX计算诸如pageRank之类的东西,并且发现graphX迭代始终是OOM。我是graphX的新手,所以我想知道有什么方法可以优化吗?

graphX迭代代码为:

val simUpdates = graph.aggregateMessages[Double](
    ctx => ctx.sendToDst(ctx.srcAttr * ctx.attr), _ + _, TripletFields.Src)

  println("simRank mid iter")

  graph = graph.outerJoinVertices(simUpdates) {
    (_, oldSim, msgSumOpt) => msgSumOpt.getOrElse(0.0) * damp
  }.outerJoinVertices(sameNode){
    (_, oldSim, msgSumOpt) => msgSumOpt.getOrElse(oldSim)
  }

  if (i%1 == 0) {
    graph.edges.foreach(_ => Unit)
    println(new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date) + s" iter $i finish")
    graph.persist(StorageLevel.MEMORY_ONLY_SER)
    graph.checkpoint()
  }

0 个答案:

没有答案