我是Spark的初学者,并试图按照这里的说明如何使用cmd从Python初始化Spark shell:http://spark.apache.org/docs/latest/quick-start.html
但是当我在cmd中运行以下内容时:
C:\Users\Alex\Desktop\spark-1.4.1-bin-hadoop2.4\>c:\Python27\python bin\pyspark
然后我收到以下错误消息:
File "bin\pyspark", line 21
export SPARK_HOME="$(cd ="$(cd "`dirname "$0"`"/..; pwd)"
SyntaxError: invalid syntax
我在这里做错了什么?
P.S。在cmd中,我只试用C:\Users\Alex\Desktop\spark-1.4.1-bin-hadoop2.4>bin\pyspark
然后我收到""python" is not recognized as internal or external command, operable program or batch file".
答案 0 :(得分:2)
您需要在系统路径中提供Python,您可以使用setx
添加它:
setx path "%path%;C:\Python27"
答案 1 :(得分:1)
我是一个相当新的Spark用户(截至今天,真的)。我在Windows 10和7机器上使用spark 1.6.0。以下对我有用:
import os
import sys
spark_home = os.environ.get('SPARK_HOME', None)
if not spark_home:
raise ValueError('SPARK_HOME environment variable is not set')
sys.path.insert(0, os.path.join(spark_home, 'python'))
sys.path.insert(0, os.path.join(spark_home, 'C:/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip'))
execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))
使用上面的代码,我能够在IPython笔记本和我的Enthought Canopy Python IDE中启动Spark。之前,我只能通过cmd提示启动pyspark。只有为Python和Spark(pyspark)正确设置了环境变量,上面的代码才有效。
答案 2 :(得分:0)
通过用户“maxymoo”的引用和帮助,我能够找到一种方法来设置PERMANENT路径也是Windows 7。说明如下:
http://geekswithblogs.net/renso/archive/2009/10/21/how-to-set-the-windows-path-in-windows-7.aspx
答案 3 :(得分:0)
每当我在ipython中启动pyspark时,我都会运行这些路径设置:
import os
import sys
# Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.0.3" "sparkr-shell"') for R
### MANNN restart spart using ipython notebook --profile=pyspark --packages com.databricks:spark-csv_2.10:1.0.3
os.environ['SPARK_HOME']="G:/Spark/spark-1.5.1-bin-hadoop2.6"
sys.path.append("G:/Spark/spark-1.5.1-bin-hadoop2.6/bin")
sys.path.append("G:/Spark/spark-1.5.1-bin-hadoop2.6/python")
sys.path.append("G:/Spark/spark-1.5.1-bin-hadoop2.6/python/pyspark/")
sys.path.append("G:/Spark/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql")
sys.path.append("G:/Spark/spark-1.5.1-bin-hadoop2.6/python/pyspark/mllib")
sys.path.append("G:/Spark/spark-1.5.1-bin-hadoop2.6/python/lib")
sys.path.append("G:/Spark/spark-1.5.1-bin-hadoop2.6/python/lib/pyspark.zip")
sys.path.append("G:/Spark/spark-1.5.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip")
sys.path.append("G:/Spark/spark-1.5.1-bin-hadoop2.6/python/lib/pyspark.zip")
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark import SQLContext
##sc.stop() # IF you wish to stop the context
sc = SparkContext("local", "Simple App")
答案 4 :(得分:0)
只需在系统中设置路径 - >环境变量 - >路径
路径必须用“;”分隔并且路径之间必须没有空格