我有一个父图,我想过滤到多个子图,所以我可以将函数应用于每个子图并提取一些数据。我的代码如下所示:
val myTerms = <RDD of terms I want to use to filter the graph>
val myVertices = ...
val myEdges = ...
val myGraph = Graph(myVertices, myEdges)
val myResults : RDD[(<Tuple>)] = myTerms.map { x => mySubgraphFunction(myGraph, x) }
其中mySubgraphFunction是一个创建子图的函数,执行计算并返回结果数据的元组。
当我运行它时,我在mySubgraphFunction调用GraphX.subgraph时得到一个Java空指针异常。如果我在条款的RDD上调用collect,我可以使其工作(还在RDD上添加了性能):
val myTerms = <RDD of terms I want to use to filter the graph>
val myVertices = <read RDD>.persist(StorageLevel.MEMORY_ONLY_SER)
val myEdges = <read RDD>.persist(StorageLevel.MEMORY_ONLY_SER)
val myGraph = Graph(myVertices, myEdges)
val myResults : Array[(<Tuple>)] = myTerms.collect().map { x =>
mySubgraphFunction(myGraph, x) }
有没有办法让我在不必调用collect()的地方工作(即将其作为分布式操作)?我正在创建~1k子图,性能很慢。