我试图从远程连接到主Spark,但是我收到了错误:"在向驱动程序发送端口号"之前退出了Java网关进程。
from pyspark.sql import SparkSession
master = "spark://192.168.56.102:7077"
SparkSession.builder.master(master).appName("spark session").getOrCreate()
Spark Master在CentOS虚拟机(独立配置)中进行了调度。
修改
对于从Virtual-Box网络界面开始的套接字,我编辑了java_gateway.py文件(来自pyspark软件包)。第62-70行:
# Start a socket that will be used by PythonGatewayServer to communicate its port to us
callback_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# >>>>>>>>>>>>>>>>
#callback_socket.bind(('127.0.0.1', 0)) # commented
callback_socket.bind(('192.168.56.1', 0)) # new line
callback_socket.listen(1)
callback_host, callback_port = callback_socket.getsockname()
env = dict(os.environ)
env['_PYSPARK_DRIVER_CALLBACK_HOST'] = callback_host
env['_PYSPARK_DRIVER_CALLBACK_PORT'] = str(callback_port)
但是我在尝试连接时遇到了同样的错误:"在向驱动程序发送端口号"之前退出了Java网关进程。