我正在尝试运行基本的scala-spark示例:
object LoadJsonWithSparkSQL{
def main(args: Array[String]) {
val master = "local"
val inputFile = "/path/to/my/local/file"
val warehouseLocation = "/path/to/spark-warehouse"
val sparkSession = SparkSession.builder.
master(master)
.appName("LoadJsonWithSparkSQL")
.config("spark.sql.warehouse.dir", warehouseLocation)
.getOrCreate()
val input = sparkSession.read.json(inputFile)
input.printSchema()
sparkSession.stop()
创建了一个火花会话。在尝试读取json文件时,我收到以下错误:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.fs.FileStatus.isDirectory()Z
at org.apache.spark.sql.execution.datasources.ListingFileCatalog$$anonfun$1$$anonfun$apply$2.apply(ListingFileCatalog.scala:129)
at org.apache.spark.sql.execution.datasources.ListingFileCatalog$$anonfun$1$$anonfun$apply$2.apply(ListingFileCatalog.scala:116)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.sql.execution.datasources.ListingFileCatalog$$anonfun$1.apply(ListingFileCatalog.scala:116)
at org.apache.spark.sql.execution.datasources.ListingFileCatalog$$anonfun$1.apply(ListingFileCatalog.scala:102)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at org.apache.spark.sql.execution.datasources.ListingFileCatalog.listLeafFiles(ListingFileCatalog.scala:102)
at org.apache.spark.sql.execution.datasources.ListingFileCatalog.refresh(ListingFileCatalog.scala:75)
at org.apache.spark.sql.execution.datasources.ListingFileCatalog.<init>(ListingFileCatalog.scala:56)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:379)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:287)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:249)
at LoadJsonWithSparkSQL$.main(LoadJsonWithSparkSQL.scala:50)
at LoadJsonWithSparkSQL.main(LoadJsonWithSparkSQL.scala) 17/07/26 17:13:37 INFO spark.SparkContext: Invoking stop() from shutdown hook
任何想法如何解决?
我的设置是:
火花:2.0.0scala:2.10
所有文件都在我的本地FS上。
答案 0 :(得分:0)
我们可以选择两个选项
sc.textFile(&#34; file:///文件路径/&#34;)。如果是文本文件。
否则,如果它的Json文件,那么你可以尝试使用数据帧
df = sqlContext.read.json(&#34; file&#34;)
请尝试使用create dataframe。这个DF非常容易探索数据。