我有一个Graph [Int,Int],其中每个边都有一个权重值。我想要做的是,为每个用户收集所有的边缘,并总结与每个用户相关的权重。
说数据如下:
import org.apache.spark.graphx._
val sc: SparkContext
// Create an RDD for the vertices
val users: RDD[(VertexId, (String, String))] =
sc.parallelize(Array((3L, ("rxin", "student")),
(7L,("jgonzal", "postdoc")),
(5L, ("franklin", "prof")),
(2L, ("istoica", "prof"))))
// Create an RDD for edges
val relationships: RDD[Edge[Int]] =
sc.parallelize(Array(Edge(3L, 7L, 12),
Edge(5L, 3L, 1),
Edge(2L, 5L, 3),
Edge(5L, 7L, 5)))
// Define a default user in case there are relationship with missing user
val defaultUser = ("John Doe", "Missing")
// Build the initial Graph
val graph = Graph(users, relationships, defaultUser)
我理想的结果是一个带有顶点id和总重量值的数据框......它基本上是一个加权的度数度量...
id value
3L 1
5L 3
7L 17
2L 0
答案 0 :(得分:0)
val temp = graph.aggregateMessages[int](triplet => {triplet.sendToDst(triplet.attr)},_ + _, TripletFields.EdgeOnly).toDF("id","value")
temp.show()