一段时间以来,我一直在使用pycharm编辑python代码,但最近遇到以下异常:
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost, executor driver): java.io.IOException: Cannot run program "python3.7": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:197)
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:122)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:95)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:109)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 16 more
尝试以下操作时会引发异常:
# Read data
sample_path = 'path/to/file'
sample_data = spark_sql_context.read.option("delimiter", "\t").csv(sample_path, schema = INPUT_DF_SCHEMA)
sample_data.show()
# Select BPID
sample_data = sample_data.select('idd')
sample_data.show()
# Convert to RDD
sample_data_rdd = sample_data.rdd.map(list)
print(sample_data_rdd.take(2))
在这里令我感到惊讶的是,代码之前运行得很好,并且不确定为什么我现在会看到这种情况。正如其他相关主题/问题中所建议的那样,我已确保以下几点:
这是我的bash_profile
的快照:
export PATH=$PATH:/usr/local/opt/python/libexec/bin
export PYSPARK_PYTHON=python3.7
当我在命令行中键入以下内容时:
/usr/local/opt/python/libexec/bin/python
我得到以下信息:
Python 3.7.6 (default, Jan 31 2020, 15:07:01)
[Clang 9.0.0 (clang-900.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
为什么这可能不起作用?
PS:我的Mac上似乎安装了以下python: