在GraphX中将多个图形合并在一起

时间:2018-06-07 14:31:03

标签: apache-spark spark-graphx

您好我已经构建了多个多图(总共11个)

ex:Graph 1 - SongArtist - SongVertex(Id,SongName)ArtistVertex(Id,ArtistName,NetWorth)Edge(歌曲,艺术家,“Sung”)

图2 - SongWriter - SongVertex(Id,SongName)WriterVertex(Id,ArtistName)Edge(歌曲,作家,“WrittenBy”)

图3 - ArtistWriter- ArtistVertex(Id,ArtistName,NetWorth)WriterVertex(Id,ArtistName)Edge(Artist,Writer,“Collaborated”)     ...

我希望能够将所有这些合并在一起以形成一个图形。 Graph1和Graph2可以在Song和Graph2上合并,Graph3可以在Writer和Graph1上合并,Graph3可以在Artist上合并。

某些图表具有由案例类定义的边缘属性和顶点属性。以下显示了Graph3的开发方式。其他人或多或少地遵循相同的结构:

case class ArtistWriterProperties(weight: String, edgeType: String) extends EdgeProperty
case class ArtistProperty(val vertexType: String, val artistName: String, val netWorth: String) extends VertexProperty
case class WriterProperty(val vertexType: String, val writerName: String) extends VertexProperty

val ArtistWriter: RDD[(VertexId, VertexProperty)] = sc.textFile(vertexArtistWriter).map {
  line =>
    val row = line.split(",")
    val id = row(0).toLong
    val vertexType = row(1)
    val prop = vertexType match {
      case "Artist" => ArtistProperty(vertexType, row(2), row(3))
      case "Writer" => WriterProperty(vertexType, row(2))
    }
    (id, prop)
}

val edgesArtistWriterCollaborated: RDD[Edge[EdgeProperty]] = sc.textFile(edgeWeightedArtistWriterCollaborated).map {
  line =>
    val row = line.split(",")
    Edge(row(0).toLong, row(1).toLong, ArtistWriterProperties(row(2), row(3)))
}

val graph3 = Graph(ArtistWriter, edgesArtistWriterCollaborated)

我正在尝试这种方式:

val graph2And3 = Graph(
  graph2.vertices.union(graph3.vertices),
  graph2.edges.union(graph3.edges)
).partitionBy(RandomVertexCut).
  groupEdges( (attr1, attr2) => attr1 + attr2 )

但我收到错误 - 输入不匹配

1 个答案:

答案 0 :(得分:1)

所以基本上你需要为顶点执行join,为边缘执行union

对于每个图,您可以获得顶点的RDD和边的RDD。

1)按所需键的顺序full outer join顶点RDD,并为最终顶点创建新ID,例如graph1.vertexes.fullOuterJoin(graph2.vertexes, "SongArtist").fullOuterJoin...

2)联合边的所有RDD然后你可以从顶点的新RDD和边的RDD创建图。