我正在尝试在Jypyter笔记本中运行pyspark,这给了我以下异常: Py4JJavaError:调用o190.showString时发生错误。 :org.apache.spark.SparkException:由于阶段失败而导致作业中止:阶段2.0中的任务0失败1次,最近一次失败:阶段2.0中的任务0.0(TID 2,本地主机,执行程序驱动程序)丢失:java.io.IOException :无法连接到/10.209.34.114:50701
我尝试过的事情:
这是我的配置(MaxOSX上的bash配置文件)
export SPARK_VERSION=`ls /usr/local/Cellar/apache-spark/ | sort | tail -1`
export SPARK_HOME="/usr/local/Cellar/apache-spark/$SPARK_VERSION/libexec"
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
export SPARK_PATH=~/spark-2.4.3-bin-hadoop2.7
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
export PACKAGES="io.delta:delta-core_2.12:0.1.0"
export PYSPARK_SUBMIT_ARGS="--packages ${PACKAGES} pyspark-shell"
export PYSPARK_PYTHON=python3
alias snotebook='$SPARK_PATH/bin/pyspark --packages com.databricks:spark-csv_2.$
这是我正在运行的示例代码:
spark = SparkSession.builder.appName("spark test").getOrCreate()
columns = ['id', 'dogs', 'cats']
vals = [
(1, 2, 0),
(2, 0, 1)
]
df = spark.createDataFrame(vals, columns)
df.show()
完整的堆栈跟踪: