使用pyspark在本地读取rdd时出错

时间:2017-09-28 08:24:56

标签: pyspark jupyter-notebook rdd

我在本地存储了一个rdd,我正在使用带有pyspark的jupyter笔记本。我试图加载rdd但它崩溃,说明找不到文件,虽然文件在那里。你知道原因吗?

a=sc.textFile('file:///myfile/test5.rdd')

错误消息

Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 13.0 failed 4 times, most recent failure: Lost task 0.3 in stage 13.0 (TID 303, abc.com, executor 3): 
java.io.FileNotFoundException: File file:/myfile/test5.rdd/part-00000 does not exist

0 个答案:

没有答案