当我用相同的数据和代码两次运行GraphX的connectedComponents时,结果是不同的,为什么?
这是我的代码:
val edgeDF = sql("select * from fkdm.fkdm_fk_base_sna_bm_edge_s_d").rdd
val edgePair = edgeDF.map{ line =>
val v1 = line.getString(0)
val v2 = line.getString(1)
(v1,v2)
}
val vertices = edgePair.map(line => line._1).union(edgePair.map(line =>
line._2)).distinct()
val verticesUniqueId = vertices.zipWithUniqueId()
val edge_numberic = edgePair.join(verticesUniqueId).map(line =>
(line._2._1,line._2._2)).join(verticesUniqueId).map(line =>
(line._2._1,line._2._2))
val vertice_rdd = verticesUniqueId.map{line =>
(line._2.toLong,line._1)
}
val edge_rdd = edge_numberic.map{line =>
Edge(line._1,line._2,1)
}
val graph = Graph(vertice_rdd,edge_rdd)
val cc = graph.connectedComponents()
val ccDF = cc.vertices.map(x =>
ComponentsGraph(x._1.toString,x._2.toString)).toDF()
ccDF.createOrReplaceTempView("temp_ccDF")
sql("drop table if exists buming.buming_sna_vertices_groupnum ")
sql("CREATE TABLE buming.buming_sna_vertices_groupnum AS SELECT * FROM
temp_ccDF")
我发现最大组中的顶点数量不同。