在spark上运行wordcount>>> lines = sc.textFile(“README.md”)>>> lines.count()

时间:2016-06-24 04:25:51

标签: pyspark

  

Py4JJavaError:调用时发生错误   Z:org.apache.spark.api.python.PythonRDD.collectAndServe。 :   org.apache.hadoop.mapred.InvalidInputException:输入路径没有   存在:   文件:/home/shubhranshu/Documents/spark/spark-1.6.1-bin-hadoop2.6/bin/README.md     在   org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)     在   org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)     在   org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)     在org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)     在   org.apache.spark.rdd.RDD $$ anonfun $分区$ 2.适用(RDD.scala:239)     在   org.apache.spark.rdd.RDD $$ anonfun $分区$ 2.适用(RDD.scala:237)     在scala.Option.getOrElse(Option.scala:120)at   org.apache.spark.rdd.RDD.partitions(RDD.scala:237)at   org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)     在   org.apache.spark.rdd.RDD $$ anonfun $分区$ 2.适用(RDD.scala:239)     在   org.apache.spark.rdd.RDD $$ anonfun $分区$ 2.适用(RDD.scala:237)     在scala.Option.getOrElse(Option.scala:120)at   org.apache.spark.rdd.RDD.partitions(RDD.scala:237)at   org.apache.spark.api.python.PythonRDD.getPartitions(PythonRDD.scala:58)     在   org.apache.spark.rdd.RDD $$ anonfun $分区$ 2.适用(RDD.scala:239)     在   org.apache.spark.rdd.RDD $$ anonfun $分区$ 2.适用(RDD.scala:237)     在scala.Option.getOrElse(Option.scala:120)at   org.apache.spark.rdd.RDD.partitions(RDD.scala:237)at   org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)at at   org.apache.spark.rdd.RDD $$ anonfun $ collect $ 1.apply(RDD.scala:927)at at   org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:150)     在   org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:111)     在org.apache.spark.rdd.RDD.withScope(RDD.scala:316)at   org.apache.spark.rdd.RDD.collect(RDD.scala:926)at   org.apache.spark.api.python.PythonRDD $ .collectAndServe(PythonRDD.scala:405)     在   org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)     在   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     在java.lang.reflect.Method.invoke(Method.java:497)at   py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)at at   py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)at   py4j.Gateway.invoke(Gateway.java:259)at   py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)     在py4j.commands.CallCommand.execute(CallCommand.java:79)at   py4j.GatewayConnection.run(GatewayConnection.java:209)at   java.lang.Thread.run(Thread.java:745)

1 个答案:

答案 0 :(得分:0)

需要提供README.md的正确路径 所以正确的代码是:

  
    
      

lines = sc.textFile(“../ README.md”)       lines.count()