Apache PySpark丢失执行程序 - 无法创建本地目录

时间:2015-06-28 19:53:42

标签: python linux amazon-ec2 apache-spark pyspark

我正在尝试在pyspark中执行.leftOuterJoin。我正在使用EC2,Anaconda,iPython笔记本,交互模式和Spark 1.3.0。

当我运行以下代码时:

success_rdd = keyedtrips_rdd.leftOuterJoin(success_rdd)
success_rdd = success_rdd.persist(StorageLevel.MEMORY_AND_DISK)
some_successes = success_rdd.take(100)

Spark大约在整个过程的一半时间内失败并显示以下消息:

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1361 in stage 16.0 failed 4 times, most recent failure: Lost task 1361.3 in stage 16.0 (TID 10438, ip-172-31-43-119.eu-west-1.compute.internal): java.io.IOException: Failed to create local dir in /mnt2/spark/spark-58043a43-5bfc-4610-a6bf-faae43b5ea5d/spark-c31061af-7fc0-45ab-b2ab-8f008005451d/spark-2ca18976-6219-4965-ac3b-aecf2e098cc1/blockmgr-40100c28-6c13-41c9-8617-9dfcf187040c/05.

非常感谢任何帮助,我对此感到非常难过。这个related question可能会讨论同一个问题,但我不理解这个问题。我以前跑过leftOuterJoin,之前从未见过这个错误......

1 个答案:

答案 0 :(得分:1)

确保您的主人SparkConfspark.local.dir定义为本地可写目录。它必须由您正在运行spark的用户写入。

SparfConf

可在此处找到更多信息:

https://spark.apache.org/docs/latest/configuration.html