如何从图中过滤掉空顶点

时间:2017-10-30 15:56:52

标签: scala apache-spark graph

假设我的图形顶点顶点为空,子图中是否有一种方法可以将它们过滤掉。

import org.apache.spark.rdd.RDD
import org.apache.spark._
import org.apache.spark.graphx._



// Create an RDD for the vertices
val users: RDD[(VertexId, (String, String))] =
  sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")),
                       (5L, ("franklin", "prof")), (2L, ("istoica", "prof")),
                       (4L, null)))

// Create an RDD for edges
val relationships: RDD[Edge[String]] =
  sc.parallelize(Array(Edge(3L, 7L, "collab"),    Edge(5L, 3L, "advisor"),
                       Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi"),
                       Edge(4L, 0L, "student"),   Edge(5L, 0L, "colleague")))

// Define a default user in case there are relationship with missing user
val defaultUser = ("John Doe", "Missing")

//val defaultUser = null

// Build the initial Graph
val graph = Graph(users, relationships, defaultUser)


graph.triplets.collect.foreach(println)

当我尝试这个时

val validGraph = graph.subgraph(vpred = (id, attr) => (id,attr._1) != null)

validGraph.triplets.collect.foreach(println)

我得到一个空指针异常。我可以像这样做mapTriplets.filter

graph.triplets.filter(triplet => triplet.srcAttr != null)

但问题是我得到了一个EdgeTriplet作为输出。在此之后,我需要使用pregel电话。

0 个答案:

没有答案