我将Graphx和Scala一起使用进行一些计算。我需要从图形中删除一些顶点并循环几次。但是代码总是抛出java.lang.StackOverflowError
var rawG = GraphLoader.edgeListFile(sc, inputFilePath,
edgeStorageLevel = StorageLevel.MEMORY_AND_DISK, vertexStorageLevel = StorageLevel.MEMORY_AND_DISK)
while (rawG.vertices.count() > 0) {
val stepDegree = rawG.degrees.map(v => v._2).min()
val g = rawG.joinVertices[Int](rawG.degrees)((_, _, newDeg) => newDeg)
rawG = g.subgraph(vpred = (vid, nodeDegree) => {
nodeDegree > stepDegree
})
}
初始图具有1亿个顶点。这个循环每次图形都会越来越小,但是StackOverflowError总是在100个循环之后出现。我猜想用 g.subgraph 创建一个新的子图,但是旧的图对象的内存不会立即释放,所以会导致异常吗?
我尝试将图形的storage_level设置为StorageLevel.MEMORY_AND_DISK,但会引发错误:“ 在已将RDD分配了级别之后,无法更改其RDD的存储级别”
scala> g.checkpoint
scala> g
res15: org.apache.spark.graphx.Graph[Int,Int] = org.apache.spark.graphx.impl.GraphImpl@79863795
scala> g.persist(StorageLevel.MEMORY_AND_DISK)
java.lang.UnsupportedOperationException: Cannot change storage level of an RDD after it was already assigned a level
at org.apache.spark.rdd.RDD.persist(RDD.scala:170)
at org.apache.spark.rdd.RDD.persist(RDD.scala:195)
at org.apache.spark.graphx.impl.VertexRDDImpl.persist(VertexRDDImpl.scala:57)
at org.apache.spark.graphx.impl.VertexRDDImpl.persist(VertexRDDImpl.scala:27)
at org.apache.spark.graphx.impl.GraphImpl.persist(GraphImpl.scala:54)
... 53 elided
我阅读了graphx的代码,最终调用了RDD.persist(newLevel:StorageLevel)的Graph.persist方法,RDD.isLocallyCheckpointed始终返回false,因此RDD.persist()无法更改存储级别。