PySpark安装 - 无法找到Spark jars目录

时间:2017-06-23 07:41:26

标签: java python hadoop apache-spark pyspark

我在 Windows 上遇到很多Spark问题。所以解释错误:

有许多安装和解决许多问题的教程,但是我已经尝试了几个小时但仍然无法使其工作。

我有 Java 8 ,我在System Path

C:\>java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

我还有 Python 2.7与Anaconda 4.4

C:\Program Files (x86)\Spark\python\dist>python -V
Python 2.7.13 :: Anaconda 4.4.0 (64-bit)

以防万一,我确实有比例 SBT GOW

C:\>scala -version
Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL

C:\>gow -version
Gow 0.8.0 - The lightweight alternative to Cygwin

C:\>sbt
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
> about
[info] This is sbt 0.13.15

所以进入安装:

  1. 首先我下载了​​Spark 2.1.1包类型预构建Apache Hadoop 2.7及更高版本

  2. 我在某个文件夹中提取它,比如C:\Programs\Spark

  3. Python文件夹上,我运行了python setup.py sdist,它应该为pip生成合适的tgz文件。

  4. 进入dist,我跑了pip install NAME_OF_PACKAGE.tgz。这确实安装了它,因为如果conda list

    C:\>conda list
    # packages in environment at C:\Program Files (x86)\Anaconda2:
    #
    ...
    pyspark                   2.1.1+hadoop2.7           <pip>
    ...
    

    我确实有些疑惑,所以我去了Anaconda的Scriptssite-packages。两者都有我的预期,在Scripts中有pyspark spark-shell等等。 pyspark上的site-packages文件夹还包含从 jars文件夹到其自己的 bin文件夹的所有内容,其中包含上述脚本。

  5. 关于hadoop,我确实下载了 winutils.exe 并将其粘贴到Spark's bin folder上,这也使其位于python's pyspark's bin folder

  6. 考虑到这一点,我确实导入了pyspark而没有问题:

    C:\Users\Rolando Casanueva>python
    Python 2.7.13 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:17:26) [MSC v.1500 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    Anaconda is brought to you by Continuum Analytics.
    Please check out: http://continuum.io/thanks and https://anaconda.org
    >>> import pyspark
    >>> 
    
  7. 第一个问题:我是否必须在python的Scripts文件夹中粘贴winutils.exe?

    进入主要情况时,使用pyspark时会出现问题并引发此异常。

    C:\Users\Rolando Casanueva>python
    Python 2.7.13 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:17:26) [MSC v.1500 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    Anaconda is brought to you by Continuum Analytics.
    Please check out: http://continuum.io/thanks and https://anaconda.org
    >>> import pyspark
    >>> pyspark.SparkContext()
    C:\Program Files (x86)\Anaconda2\lib\site-packages\pyspark
    "Files" no se reconoce como un comando interno o externo, 
    programa o archivo por lotes ejecutable.
    Failed to find Spark jars directory.
    You need to build Spark before running this program.
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Program Files (x86)\Anaconda2\lib\site-packages\pyspark\context.py", line 115, in __init__
        SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
      File "C:\Program Files (x86)\Anaconda2\lib\site-packages\pyspark\context.py", line 259, in _ensure_initialized
        SparkContext._gateway = gateway or launch_gateway(conf)
      File "C:\Program Files (x86)\Anaconda2\lib\site-packages\pyspark\java_gateway.py", line 96, in launch_gateway
        raise Exception("Java gateway process exited before sending the driver its port number")
    Exception: Java gateway process exited before sending the driver its port number
    >>>
    

    https://www.youtube.com/watch?v=omlwDosMGVk

    • 我确实安装了spark作为jupyter补充

    https://mas-dse.github.io/DSE230/installation/windows/

    • 最后我尝试了以上。

    每次安装都会显示相同的错误。

    第二个问题:如何解决这个问题?

    额外问题:还有其他建议安装吗?

0 个答案:

没有答案