spark没有文件系统

时间:2016-03-22 09:48:13

标签: apache-spark hdfs

它在本地工作,但在linux节点上不起作用。 我用sbt-assembly做了一个胖罐子。

这是我的plugins.sbt

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.1")
addSbtPlugin("org.xerial.sbt" % "sbt-pack" % "0.7.7")

我尝试通过databricks.spark包加载csv文件,但没有使用hdfs。 这是我认为错误出现的部分。

val logFile = "/spark/data/test.csv"
val rawdf = sqlContext.read
  .format("com.databricks.spark.csv")
  .option("header", "true") // Use first line of all files as header
  .option("inferSchema", "true") // Automatically infer data types
  .load(logFile)

但不知怎的,它不起作用。这是日志消息

16/03/22 18:36:24 INFO MemoryStore: ensureFreeSpace(213512) called with curMem=242504, maxMem=555                                   755765
16/03/22 18:36:24 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size                                    208.5 KB, free 529.6 MB)
16/03/22 18:36:24 INFO MemoryStore: ensureFreeSpace(19788) called with curMem=456016, maxMem=5557                                   55765
16/03/22 18:36:24 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated                                    size 19.3 KB, free 529.6 MB)
16/03/22 18:36:24 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.2.18:42276                                    (size: 19.3 KB, free: 530.0 MB)
16/03/22 18:36:24 INFO SparkContext: Created broadcast 4 from textFile at TextFile.scala:30
16/03/22 18:36:24 INFO FileInputFormat: Total input paths to process : 1
Exception in thread "main" java.io.IOException: No FileSystem for scheme: d
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
        at org.apache.spark.SparkHadoopWriter$.createPathFromString(SparkHadoopWriter.scala:170)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDD                                   Functions.scala:988)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctio                                   ns.scala:965)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctio                                   ns.scala:965)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
        at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:965)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDD                                   Functions.scala:897)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctio                                   ns.scala:897)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctio                                   ns.scala:897)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
        at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:896)
        at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1430)
        at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1409)
        at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1409)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
        at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1409)
        at destiny.spark.PACase.make(PACase.scala:58)
        at TestPaApp$.main(TestPaApp.scala:114)
        at TestPaApp.main(TestPaApp.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(Spar                                   kSubmit.scala:674)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/03/22 18:36:24 INFO SparkContext: Invoking stop() from shutdown hook
16/03/22 18:36:24 INFO SparkUI: Stopped Spark web UI at http://192.168.2.18:4040
16/03/22 18:36:24 INFO DAGScheduler: Stopping DAGScheduler
16/03/22 18:36:24 INFO SparkDeploySchedulerBackend: Shutting down all executors
16/03/22 18:36:24 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
16/03/22 18:36:24 INFO Master: Received unregister request from application app-20160322183602-00                                   14
16/03/22 18:36:24 INFO Master: Removing app app-20160322183602-0014
16/03/22 18:36:24 INFO Worker: Asked to kill executor app-20160322183602-0014/0
16/03/22 18:36:24 INFO ExecutorRunner: Runner thread for executor app-20160322183602-0014/0 inter                                   rupted

1 个答案:

答案 0 :(得分:1)

您似乎正在尝试加载名为D:\...的文件 - Linux上不存在此类路径,因此D:被解析为FileSystem 方案 (例如hdfs:file:)。将您的文件名更改为可从Linux节点访问的内容(本地Linux文件或某些共享文件系统上的路径,如HDFS)。