如何获得Spark Graphx中的公共边数?

时间:2015-07-21 13:55:56

标签: scala apache-spark spark-graphx

例如,如果我有两个顶点和边的图形,如下所示:

import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD

val vertexRdd1: RDD[(VertexId, (String, Int))] = sc.parallelize(Array(
      (1L, ("a", 28)),
      (2L, ("b", 27)),
      (3L, ("c", 65))
))

val edgeRdd1: RDD[Edge[Int]] = sc.parallelize(Array(
    Edge(1L, 2L, 1),
    Edge(2L, 3L, 8)
))

val vertexRdd2: RDD[(VertexId, (String, Int))] = sc.parallelize(Array(
    (1L, ("a", 28)),
    (2L, ("b", 27)),
    (3L, ("c", 28)),
    (4L, ("d", 27)),
    (5L, ("e", 65))
))

val edgeRdd2: RDD[Edge[Int]]  = sc.parallelize(Array(
    Edge(1L, 2L, 1),
    Edge(2L, 3L, 4),
    Edge(3L, 5L, 1),
    Edge(2L, 4L, 1)
))

如何在不考虑边缘属性的情况下获得这两个图之间的公共边数?因此,在上面的示例中,公共边的数量是2,公共边是:Edge(1L,2L,1)与Edge(1L,2L,1)和Edge(2L,3L,8)共用边缘( 2L,3L,4)。

我正在用scala编程。

1 个答案:

答案 0 :(得分:1)

假设您有graph1Graph(vertexRdd1, edgeRdd1))和graph2Graph(vertexRdd2, edgeRdd2))),您可以将边缘映射到(srcId, dstId),然后使用intersection方法:

val srcDst1 = graph1.edges.map(e => (e.srcId, e.dstId))
val srcDst2 = graph2.edges.map(e => (e.srcId, e.dstId))
srcDst1.intersection(srcDst2).count()