PySpark:尽管路径正确,但火花提交仍然丢失

时间:2020-10-14 15:39:40

标签: python pyspark environment-variables

我在PyCharm调试配置中设置了以下环境变量:

SPARK_HOME = /somewhere/spark-3.0.0-bin-hadoop2.7
PYTHONPATH = /somewhere/spark-3.0.0-bin-hadoop2.7/python

我正在尝试运行以下代码:

import pyspark as ps
context = ps.SparkContext('myApp')

遵循以下执行路径,代码立即出错:

  File "/somewhere/venv/lib/python3.7/site-packages/pyspark/context.py", line 325, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)

  File "/somewhere/venv/lib/python3.7/site-packages/pyspark/java_gateway.py", line 95, in launch_gateway
proc = Popen(command, **popen_kwargs)

  File "/usr/lib/python3.7/subprocess.py", line 800, in __init__
restore_signals, start_new_session)

  File "/usr/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)

  FileNotFoundError: [Errno 2] No such file or directory: '/somewhere/spark-3.0.0-bin-hadoop2.7/./bin/spark-submit': '/somewhere/spark-3.0.0-bin-hadoop2.7/./bin/spark-submit'

使用调试器,我可以将其跟踪到以下行:

56    SPARK_HOME = _find_spark_home()
57    # Launch the Py4j gateway using Spark's run command so that we pick up the
58    # proper classpath and settings from spark-env.sh
59    on_windows = platform.system() == "Windows"
60    script = "./bin/spark-submit.cmd" if on_windows else "./bin/spark-submit"
61    command = [os.path.join(SPARK_HOME, script)]

在第56行中,SPARK_HOME的值为'/somewhere/spark-3.0.0-bin-hadoop2.7'是准确的。该错误是由第60行引起的,该行的开头为./。它会导致在最终错误消息中看到多余的./。因此,第61行会导致:

'~/Downloads/spark-3.0.0-bin-hadoop2.7/./bin/spark-submit'

spark-submit.cmd内同时看到了spark-submit/somewhere/spark-3.0.0-bin-hadoop2.7/bin/,我进行了更改以更正路径,方法是用第60行代替,删除了有害字符./

60    script = "bin/spark-submit.cmd" if on_windows else "bin/spark-submit"

这在第61行中为我提供了command的以下值:

'/somewhere/spark-3.0.0-bin-hadoop2.7/bin/spark-submit'

它仍然出错,最后一条消息是:

FileNotFoundError: [Errno 2] No such file or directory: '/somewhere/spark-3.0.0-bin-hadoop2.7/bin/spark-submit': '/somewhere/spark-3.0.0-bin-hadoop2.7/bin/spark-submit'

这是不对的,因为该文件位于该位置,并且whereis spark-shell确定为Spark文件夹。我究竟做错了什么?我对编辑该行60感觉不好。请提出建议。我是PySpark的新手,太深了。谢谢。

我打开终端并执行被称为无效路径的字符串:

/somewhere/spark-3.0.0-bin-hadoop2.7/./bin/spark-submit

它给了我很多东西:

20/10/15 00:57:25 WARN Utils: Your hostname, fila-vm-00 resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface enp0s3)
20/10/15 00:57:25 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Usage: spark-submit [options] <app jar | python file | R file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]
Usage: spark-submit run-example [options] example-class [example args]

为什么PyCharm会抱怨?

0 个答案:

没有答案