在jupyter上的Pyspark命令:在远程服务器上连接spark

时间:2017-10-22 16:41:46

标签: pyspark anaconda jupyter

我在远程Linux服务器(IBM RHEL Z系统)上配置了Spark 2.1。我正在尝试创建一个SparkContext并获得以下错误

from pyspark.context import SparkContext, SparkConf
master_url="spark://<IP>:7077"
conf = SparkConf()
conf.setMaster(master_url)
conf.setAppName("App1")
sc = SparkContext.getOrCreate(conf)

我收到以下错误。当我在pyspark shell中的远程服务器上运行相同的代码时,它可以正常工作。

The currently active SparkContext was created at:

(No active SparkContext.)

    at org.apache.spark.SparkContext.assertNotStopped(SparkContext.scala:100)
    at org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1768)
    at org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2411)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:563)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:236)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)

1 个答案:

答案 0 :(得分:0)

听起来你没有把jupyter设为pyspark司机。在从jupyter控制pyspark之前,您必须先设置PYSPARK_DRIVER_PYTHON=jupyterPYSPARK_DRIVER_PYTHON_OPTS='notebook'。如果你查看libexec/bin/pyspark中的代码(在OSX上),我没有记错,你会找到设置jupyter笔记本的说明。