我正在尝试整合JupyterHub和PySpark。为此,我在本地Ubuntu计算机上安装了JupyterHub和PySpark,然后进行了必要的配置,以使JupyterHub将PySpark识别为其内核之一。
只要我使用Python3.6 + Pyspark(Spark 2.4.0)创建笔记本,就可以成功加载我设置的内核,因此集成似乎可以正常工作。
但是,当我尝试通过以下代码创建Spark上下文时,会出现问题:
from pyspark import SparkContext
sc = SparkContext("local")
它会引发以下错误:
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-1-c58704bbced4> in <module>()
3 from pyspark import SparkContext
4
----> 5 sc = SparkContext("local")
6
7 import random
~/Programas/spark-2.4.0-bin-hadoop2.7/python/pyspark/context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
113 """
114 self._callsite = first_spark_call() or CallSite(None, None, None)
--> 115 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
116 try:
117 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
~/Programas/spark-2.4.0-bin-hadoop2.7/python/pyspark/context.py in _ensure_initialized(cls, instance, gateway, conf)
296 with SparkContext._lock:
297 if not SparkContext._gateway:
--> 298 SparkContext._gateway = gateway or launch_gateway(conf)
299 SparkContext._jvm = SparkContext._gateway.jvm
300
~/Programas/spark-2.4.0-bin-hadoop2.7/python/pyspark/java_gateway.py in launch_gateway(conf)
92
93 if not os.path.isfile(conn_info_file):
---> 94 raise Exception("Java gateway process exited before sending its port number")
95
96 with open(conn_info_file, "rb") as info:
Exception: Java gateway process exited before sending its port number
我已经阅读了由用户创建的具有类似问题的其他线程,但是我对所提出的解决方案并不满意。我的JAVA_HOME已设置;我正在使用Java 8;我尝试将PYSPARK_SUBMIT_ARGS添加到环境中;