用相同的数据和代码两次运行GraphX的connectedComponents,结果不同

时间:2018-09-11 05:48:32

标签: spark-graphx

当我用相同的数据和代码两次运行GraphX的connectedComponents时,结果是不同的,为什么?

这是我的代码:

val edgeDF = sql("select * from fkdm.fkdm_fk_base_sna_bm_edge_s_d").rdd   
val edgePair = edgeDF.map{ line =>
    val v1 =  line.getString(0)
    val v2 =  line.getString(1)
    (v1,v2)
}
val vertices = edgePair.map(line => line._1).union(edgePair.map(line => 
    line._2)).distinct()

val verticesUniqueId = vertices.zipWithUniqueId()
val edge_numberic = edgePair.join(verticesUniqueId).map(line => 
    (line._2._1,line._2._2)).join(verticesUniqueId).map(line => 
    (line._2._1,line._2._2))

val vertice_rdd = verticesUniqueId.map{line =>
    (line._2.toLong,line._1)
}
val edge_rdd = edge_numberic.map{line =>
    Edge(line._1,line._2,1)
}
val graph = Graph(vertice_rdd,edge_rdd)
val cc = graph.connectedComponents()

val ccDF = cc.vertices.map(x => 
    ComponentsGraph(x._1.toString,x._2.toString)).toDF()

ccDF.createOrReplaceTempView("temp_ccDF")
sql("drop table if exists buming.buming_sna_vertices_groupnum ")
sql("CREATE TABLE buming.buming_sna_vertices_groupnum AS SELECT * FROM 
    temp_ccDF")

我发现最大组中的顶点数量不同。

0 个答案:

没有答案