假设我的图形顶点顶点为空,子图中是否有一种方法可以将它们过滤掉。
import org.apache.spark.rdd.RDD
import org.apache.spark._
import org.apache.spark.graphx._
// Create an RDD for the vertices
val users: RDD[(VertexId, (String, String))] =
sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")),
(5L, ("franklin", "prof")), (2L, ("istoica", "prof")),
(4L, null)))
// Create an RDD for edges
val relationships: RDD[Edge[String]] =
sc.parallelize(Array(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"),
Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi"),
Edge(4L, 0L, "student"), Edge(5L, 0L, "colleague")))
// Define a default user in case there are relationship with missing user
val defaultUser = ("John Doe", "Missing")
//val defaultUser = null
// Build the initial Graph
val graph = Graph(users, relationships, defaultUser)
graph.triplets.collect.foreach(println)
当我尝试这个时
val validGraph = graph.subgraph(vpred = (id, attr) => (id,attr._1) != null)
validGraph.triplets.collect.foreach(println)
我得到一个空指针异常。我可以像这样做mapTriplets.filter
graph.triplets.filter(triplet => triplet.srcAttr != null)
但问题是我得到了一个EdgeTriplet作为输出。在此之后,我需要使用pregel电话。