错误:必须指定主资源(JAR或Python或R文件) - IPython notebook

时间:2015-07-02 20:08:18

标签: apache-spark ipython pyspark

我尝试在IPython Notebook中运行Apache Spark,遵循这个insruction(以及评论中的所有建议) - link

但是当我通过这个命令运行IPython Notebook时:

ipython notebook --profile=pyspark

我收到此错误:

Error: Must specify a primary resource (JAR or Python or R file)

如果我在shell中运行pyspark,一切正常。这意味着我在连接Spark和IPython时遇到了一些麻烦。

顺便说一句,这是我的bash_profile:

export SPARK_HOME="$HOME/spark-1.4.0"
export PYSPARK_SUBMIT_ARGS='--conf "spark.mesos.coarse=true" pyspark-shell'

这包含〜/ .ipython / profile_pyspark / startup / 00-pyspark-setup.py

# Configure the necessary Spark environment
import os
import sys

# Spark home
spark_home = os.environ.get("SPARK_HOME")

# If Spark V1.4.x is detected, then add ' pyspark-shell' to
# the end of the 'PYSPARK_SUBMIT_ARGS' environment variable
spark_release_file = spark_home + "/RELEASE"
if os.path.exists(spark_release_file) and "Spark 1.4" in  open(spark_release_file).read():
    pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
    if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"
    os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

# Add the spark python sub-directory to the path
sys.path.insert(0, spark_home + "/python")

# Add the py4j to the path.
# You may need to change the version number to match your install
sys.path.insert(0, os.path.join(spark_home, "python/lib/py4j-0.8.2.1-src.zip"))

# Initialize PySpark to predefine the SparkContext variable 'sc'
execfile(os.path.join(spark_home, "python/pyspark/shell.py"))

可能是必要的 - 昨天我将OS X升级到10.10.4

1 个答案:

答案 0 :(得分:8)

我遇到了类似问题,与00-pyspark-setup.py一起使用时,我使用了相同的spark-1.4.0文件。

正如Philippe Rossignol对this blog的评论所解释, 以下行已添加到00-pyspark-setup.py文件中 因为pyspark-shell需要参数PYSPARK_SUBMIT_ARGS

# If Spark V1.4.x is detected, then add ' pyspark-shell' to
# the end of the 'PYSPARK_SUBMIT_ARGS' environment variable
spark_release_file = spark_home + "/RELEASE"
if os.path.exists(spark_release_file) and "Spark 1.4" in open(spark_release_file).read():
    pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
    if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"
    os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

但是,在我的spark-1.4.0文件夹中,没有RELEASE个文件,因此if附加pyspark-shell PYSPARK_SUBMIT_ARGS的条件从未得到满足。

作为一个kludgy解决方案,我只是注释掉检查发布文件的行,所以只留下以下几行:

pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args