Question

我需要在Scala中使用Apache Spark Graphx实现HDRF：基于流的分区 - Power-Law Graphs算法，我有以下数据集 -

如何实现此算法？任何提示将不胜感激。

如果需要，我也有这个可以帮助您的PDF：http://slideshare.net/AndreaCingolani/03-hdrf-presentation。但是现在我已经能够加载图形但我不明白如何在hdrf中进行分区。我尝试在graphx的partitionStrategy类中实现该算法，结果不佳

/ *编辑* /

import org.apache.spark.graphx._
import org.apache.spark._
import org.apache.spark.graphx.PartitionStrategy.{HDRF}


/**
  * Main
  */

object Main{

  def main(args: Array[String]) {

    val sc = new SparkContext(new SparkConf().setMaster("local").setAppName("HDRF"))

    sc.setLogLevel("ERROR")

    val graph= GraphLoader.edgeListFile(sc,"data/sample_graph.txt")

    val numPartition= graph.partitionBy(HDRF)


  }
}




case object HDRF extends PartitionStrategy {

    val partSize:Array[Int] = new Array[Int](8)
    var vertexAssignment:Array[VertexId] = new Array[VertexId](8)

    override def getPartition(src: VertexId, dst: VertexId, numParts: PartitionID): PartitionID = {

    }

}

在Graphx PartitionStrategy类中的HDRF对象内部不知道要写什么

Answer 1

恐怕数组partSize和vertexAssignment不会在跨工作程序中全局维护，因为据我所知，HDRF算法需要知道这两个信息。

如何实现HDRF算法？

1 个答案: