sc.textFile在spark-shell中不起作用

时间:2016-05-26 23:58:19

标签: scala hadoop apache-spark

所以我开始养成在spark-shell中进行调试的习惯,而且我遇到了从textFile收集的问题。

我基本上在hdfs中有一个带有JSON字符串的文件,当我尝试

val y = sc.textFile("hdfs://nameservice1/path/to/file.txt").collect()

我收到以下错误:

16/05/26 23:51:42 WARN storage.BlockManager: Block broadcast_0 already exists on this machine; not re-adding it
16/05/26 23:51:42 WARN storage.BlockManager: Block broadcast_0_piece0 already exists on this machine; not re-adding it
16/05/26 23:51:42 INFO spark.SparkContext: Created broadcast 0 from textFile at QueryAnalyzer.scala:50
java.lang.ClassCastException: scala.collection.immutable.HashMap$HashTrieMap cannot be cast to org.apache.spark.SerializableWritable
    at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:140)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:196)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

之前有没有人遇到这样的事情?

0 个答案:

没有答案