所以我开始养成在spark-shell中进行调试的习惯,而且我遇到了从textFile收集的问题。
我基本上在hdfs中有一个带有JSON字符串的文件,当我尝试
时val y = sc.textFile("hdfs://nameservice1/path/to/file.txt").collect()
我收到以下错误:
16/05/26 23:51:42 WARN storage.BlockManager: Block broadcast_0 already exists on this machine; not re-adding it
16/05/26 23:51:42 WARN storage.BlockManager: Block broadcast_0_piece0 already exists on this machine; not re-adding it
16/05/26 23:51:42 INFO spark.SparkContext: Created broadcast 0 from textFile at QueryAnalyzer.scala:50
java.lang.ClassCastException: scala.collection.immutable.HashMap$HashTrieMap cannot be cast to org.apache.spark.SerializableWritable
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:140)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:196)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
之前有没有人遇到这样的事情?