当我使用Scala脚本运行LDA模型时,如果迭代次数很多,它将以“没有剩余空间”的错误终止。
我用以下脚本检查空间量:
val perNodeSpaceInGB = sc.parallelize(0 to 100).map { _ =>
val hostname = ("hostname".!!).trim
val spaceInGB = ("df /local_disk".!!).split(" +")(9).toInt / 1024 / 1024
//System.gc()
(hostname, spaceInGB)
}.collect.distinct
println(f"There are ${perNodeSpaceInGB.size} nodes in this cluster. Per node free space (in GB):\n--------------------------------------")
perNodeSpaceInGB.foreach{case (a, b) => println(f"$a\t\t$b%2.2f")}
val totalSpaceInGB = perNodeSpaceInGB.map(_._2).sum
并且看到自由空间的数量逐渐减少,直到零并终止。好像有些临时文件没有及时删除。检查点设置为每10次迭代。
任何提示?
错误:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 885.0 failed 4 times, most recent failure: Lost task 2.3 in stage 885.0 (TID 586, 10.0.239.157): java.io.IOException: No space left on device