我想加载RDD,如果失败,则创建RDD。我认为下面的代码可行,但即使sc.textFile()在try块内,它仍然会失败。我错过了什么或如何正确地做到这一点?谢谢!
// look for my RDD, load or make it
val rdddump = "hdfs://localhost/Users/data/hdfs/namenode/myRDD.txt"
val myRdd = try {
sc.textFile(rdddump)
} catch {
case _ : Throwable => {
println("failed to load RDD from HDFS")
val newRdd = [....code to make new RDD here...]
newRdd.saveAsTextFile(rdddump)
newRdd
}
}
println(myRdd)
println("RDD count = " + myRdd.count)
和错误如下所示
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost/Users/data/hdfs/namenode/myRDD.txt
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:175)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1097)
at org.apache.spark.rdd.RDD.count(RDD.scala:861)
...
答案 0 :(得分:4)
您正在错误的位置捕获异常,堆栈跟踪显示清楚。调用sc.textFile
除了声明某个操作和RDD之间的关系外什么都不做。例如,没有任何东西会触发计算,导致它检查是否存在输入。