使用spark

时间:2017-05-30 03:45:47

标签: apache-spark spark-csv

我在apache spark中读取本地文件时出错。     阶> val f = sc.textFile(“/ home / cloudera / Downloads / sample.txt”)

f: org.apache.spark.rdd.RDD[String] = /home/cloudera/Downloads/sample.txt MapPartitionsRDD[9] at textFile at <console>:27

阶&GT; f.count()

  

org.apache.hadoop.mapred.InvalidInputException:输入路径没有   存在:   hdfs://quickstart.cloudera:8020 / home / cloudera / Downloads / sample.txt at at   org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)     在   org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)     在   org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)     在org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)     在   org.apache.spark.rdd.RDD $$ anonfun $分区$ 2.适用(RDD.scala:239)     在   org.apache.spark.rdd.RDD $$ anonfun $分区$ 2.适用(RDD.scala:237)     在scala.Option.getOrElse(Option.scala:120)at   org.apache.spark.rdd.RDD.partitions(RDD.scala:237)at   org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)     在   org.apache.spark.rdd.RDD $$ anonfun $分区$ 2.适用(RDD.scala:239)     在   org.apache.spark.rdd.RDD $$ anonfun $分区$ 2.适用(RDD.scala:237)     在scala.Option.getOrElse(Option.scala:120)at   org.apache.spark.rdd.RDD.partitions(RDD.scala:237)at   org.apache.spark.SparkContext.runJob(SparkContext.scala:1959)at at   org.apache.spark.rdd.RDD.count(RDD.scala:1157)at   $ iwC $$ iwC $$ iwC $$ iwC $$ iwC $$ iwC $$ iwC $$ iwC。(:30)at   $ iwC $$ iwC $$ iwC $$ iwC $$ iwC $$ iwC $$ iwC。(:35)at   $ iwC $$ iwC $$ iwC $$ iwC $$ iwC $$ iwC。(:37)at at   $ iwC $$ iwC $$ iwC $$ iwC $$ iwC。(:39)at   $ iwC $$ iwC $$ iwC $$ iwC。(:41)at   $ iwC $$ iwC $$ iwC。(:43)at $ iwC $$ iwC。(:45)     在$ iwC。(:47)at(:49)at   (:53)at。()at   。(:7)at。()at $ print()     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)     在   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     在java.lang.reflect.Method.invoke(Method.java:606)at   org.apache.spark.repl.SparkIMain $ ReadEvalPrint.call(SparkIMain.scala:1045)     在   org.apache.spark.repl.SparkIMain $ Request.loadAndRun(SparkIMain.scala:1326)     在   org.apache.spark.repl.SparkIMain.loadAndRunReq $ 1(SparkIMain.scala:821)     在org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:852)     在org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800)     在   org.apache.spark.repl.SparkILoop.reallyInterpret $ 1(SparkILoop.scala:857)     在   org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)     在org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)at   org.apache.spark.repl.SparkILoop.processLine $ 1(SparkILoop.scala:657)     在org.apache.spark.repl.SparkILoop.innerLoop $ 1(SparkILoop.scala:665)     在   org.apache.spark.repl.SparkILoop.org $阿帕奇$火花$ REPL $ SparkILoop $$环(SparkILoop.scala:670)     在   org.apache.spark.repl.SparkILoop $$ anonfun $ $组织阿帕奇$火花$ REPL $ SparkILoop $$过程$ 1.适用$ MCZ $ SP(SparkILoop.scala:997)     在   org.apache.spark.repl.SparkILoop $$ anonfun $ $组织阿帕奇$火花$ REPL $ SparkILoop $$过程$ 1.适用(SparkILoop.scala:945)     在   org.apache.spark.repl.SparkILoop $$ anonfun $ $组织阿帕奇$火花$ REPL $ SparkILoop $$过程$ 1.适用(SparkILoop.scala:945)     在   scala.tools.nsc.util.ScalaClassLoader $ .savingContextLoader(ScalaClassLoader.scala:135)     在   org.apache.spark.repl.SparkILoop.org $阿帕奇$火花$ REPL $ SparkILoop $$过程(SparkILoop.scala:945)     在org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1064)     在org.apache.spark.repl.Main $ .main(Main.scala:35)at   org.apache.spark.repl.Main.main(Main.scala)at   sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)     在   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     在java.lang.reflect.Method.invoke(Method.java:606)at   org.apache.spark.deploy.SparkSubmit $ .ORG $阿帕奇$火花$部署$ SparkSubmit $$ runMain(SparkSubmit.scala:730)     在   org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:181)     在org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:206)     在org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:121)     在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

1 个答案:

答案 0 :(得分:1)

您必须指定文件路径。设置hadoop路径时需要指定路径。

sc.textFile("file:///home/cloudera/Downloads/sample.txt")

希望这有帮助!