Scala - Spark:根据顶点和边数据框构建一个Graph(graphX)

时间:2017-09-27 15:49:18

标签: scala apache-spark spark-dataframe spark-graphx

我有两个带有此架构的数据框:

     edges
     |-- src: string (nullable = true) 
     |-- dst: string (nullable = true) 
     |-- relationship: struct (nullable = false) 
     | |-- business_id: string (nullable = true) 
     | |-- normalized_influence: double (nullable = true) root 

    vertices
    |-- id: string (nullable = true) 
    |-- state: boolean (nullable = true)

要绘制图表,我以这种方式转换了这些数据框:

import org.apache.spark.graphx._
import scala.util.hashing.MurmurHash3

case class Relationship(business_id: String, normalized_influence: Double)
case class MyEdge(src: String, dst: String, relationship: Relationship)
val edgesRDD: RDD[Edge[Relationship]] = communityEdgeDF.as[MyEdge].rdd.map  { edge =>
Edge(
    MurmurHash3.stringHash(edge.src).toLong, 
    MurmurHash3.stringHash(edge.dst).toLong,
    edge.relationship
   )
} 

case class MyVertex(id: String, state: Boolean)
val verticesRDD : RDD[(VertexId, (String, Boolean))] =   communityVertexDF.as[MyVertex].rdd.map { vertex =>
 (
  MurmurHash3.stringHash(vertex.id).toLong,
  (vertex.id, vertex.state)
 )   
}

val graphX = Graph(verticesRDD, edgesRDD) 

这是顶点输出的一部分

res6: Array[(org.apache.spark.graphx.VertexId, (String, Boolean))] = Array((1874415454,(KRZALzi0ZgrGYyjZNg72_g,false)), (1216259959,(JiFBQ_-vWgJtRZEEruSStg,false)), (-763896211,(LZge-YpVL0ukJVD2nw5sag,false)), (-2032982683,(BHP3LVkTOfh3w4UIhgqItg,false)), (844547135,(JRC3La2fiNkK0VU7qZ9vyQ,false)) 

这边缘:

res3: Array[org.apache.spark.graphx.Edge[Relationship]] = Array(Edge(-268040669,1495494297,Relationship(cJWbbvGmyhFiBpG_5hf5LA,0.0017532149785518423)), Edge(-268040669,-125364603,Relationship(cJWbbvGmyhFiBpG_5hf5LA,0.0017532149785518423))

但是这样做:

graphX.vertices.collect

我有错误的输出:

 Array((1981723824,null), (-333497649,null), (-597749329,null), (451246392,null), (-1287295481,null), (1013727024,null), (-194805089,null), (1621180464,null), (1874415454,(KRZALzi0ZgrGYyjZNg72_g,false)), (1539311488,null)

有什么问题?构建Graph是不对的?

0 个答案:

没有答案