尝试从本地FS读取json文件时出现Spark错误

时间:2017-07-26 15:38:52

标签: json scala apache-spark

我正在尝试运行基本的scala-spark示例:

object LoadJsonWithSparkSQL{
   def main(args: Array[String]) {
     val master = "local"
     val inputFile = "/path/to/my/local/file"    
     val warehouseLocation = "/path/to/spark-warehouse"

     val sparkSession = SparkSession.builder.
         master(master)
        .appName("LoadJsonWithSparkSQL")
        .config("spark.sql.warehouse.dir", warehouseLocation)
        .getOrCreate()

    val input = sparkSession.read.json(inputFile)
    input.printSchema()
    sparkSession.stop()

创建了一个火花会话。在尝试读取json文件时,我收到以下错误:

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.fs.FileStatus.isDirectory()Z
at org.apache.spark.sql.execution.datasources.ListingFileCatalog$$anonfun$1$$anonfun$apply$2.apply(ListingFileCatalog.scala:129)
at org.apache.spark.sql.execution.datasources.ListingFileCatalog$$anonfun$1$$anonfun$apply$2.apply(ListingFileCatalog.scala:116)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.sql.execution.datasources.ListingFileCatalog$$anonfun$1.apply(ListingFileCatalog.scala:116)
at org.apache.spark.sql.execution.datasources.ListingFileCatalog$$anonfun$1.apply(ListingFileCatalog.scala:102)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at org.apache.spark.sql.execution.datasources.ListingFileCatalog.listLeafFiles(ListingFileCatalog.scala:102)
at org.apache.spark.sql.execution.datasources.ListingFileCatalog.refresh(ListingFileCatalog.scala:75)
at org.apache.spark.sql.execution.datasources.ListingFileCatalog.<init>(ListingFileCatalog.scala:56)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:379)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:287)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:249)
at LoadJsonWithSparkSQL$.main(LoadJsonWithSparkSQL.scala:50)
at LoadJsonWithSparkSQL.main(LoadJsonWithSparkSQL.scala) 17/07/26 17:13:37 INFO spark.SparkContext: Invoking stop() from shutdown hook

任何想法如何解决?

我的设置是:

火花:2.0.0

scala:2.10

所有文件都在我的本地FS上。

1 个答案:

答案 0 :(得分:0)

我们可以选择两个选项 sc.textFile(&#34; file:///文件路径/&#34;)。如果是文本文件。
否则,如果它的Json文件,那么你可以尝试使用数据帧 df = sqlContext.read.json(&#34; file&#34;)
请尝试使用create dataframe。这个DF非常容易探索数据。