如何用graphx

时间:2016-02-26 09:58:44

标签: algorithm scala apache-spark spark-graphx

我有一个Graph [Int,Int],其中每个边都有一个权重值。我想要做的是,为每个用户收集所有的边缘,并总结与每个用户相关的权重。

说数据如下:

    import org.apache.spark.graphx._
    val sc: SparkContext
        // Create an RDD for the vertices
        val users: RDD[(VertexId, (String, String))] = 
             sc.parallelize(Array((3L, ("rxin", "student")), 
                                  (7L,("jgonzal", "postdoc")),
                                  (5L, ("franklin", "prof")), 
                                  (2L, ("istoica", "prof"))))
    // Create an RDD for edges
    val relationships: RDD[Edge[Int]] =
         sc.parallelize(Array(Edge(3L, 7L, 12),
                              Edge(5L, 3L, 1),
                              Edge(2L, 5L, 3), 
                              Edge(5L, 7L, 5)))

    // Define a default user in case there are relationship with missing user
    val defaultUser = ("John Doe", "Missing")

    // Build the initial Graph
    val graph = Graph(users, relationships, defaultUser)

我理想的结果是一个带有顶点id和总重量值的数据框......它基本上是一个加权的度数度量...

id    value
3L    1
5L    3
7L    17
2L    0

1 个答案:

答案 0 :(得分:0)

val temp = graph.aggregateMessages[int](triplet => {triplet.sendToDst(triplet.attr)},_ + _, TripletFields.EdgeOnly).toDF("id","value")

temp.show()