在Jupyter笔记本中运行pySpark - Windows

时间:2016-07-02 16:56:19

标签: python pyspark jupyter

我想从Jupyter笔记本运行pySpark。我下载并安装了有Juptyer的Anaconda。我创建了以下行

 from pyspark import SparkConf, SparkContext
 conf = SparkConf().setMaster("local").setAppName("My App")
 sc = SparkContext(conf = conf)

我收到以下错误

ImportError                               Traceback (most recent call last)
<ipython-input-3-98c83f0bd5ff> in <module>()
  ----> 1 from pyspark import SparkConf, SparkContext
  2 conf = SparkConf().setMaster("local").setAppName("My App")
  3 sc = SparkContext(conf = conf)

 C:\software\spark\spark-1.6.2-bin-hadoop2.6\python\pyspark\__init__.py in   <module>()
 39 
 40 from pyspark.conf import SparkConf
  ---> 41 from pyspark.context import SparkContext
 42 from pyspark.rdd import RDD
 43 from pyspark.files import SparkFiles

 C:\software\spark\spark-1.6.2-bin-hadoop2.6\python\pyspark\context.py in <module>()
 26 from tempfile import NamedTemporaryFile
 27 
 ---> 28 from pyspark import accumulators
 29 from pyspark.accumulators import Accumulator
 30 from pyspark.broadcast import Broadcast

 ImportError: cannot import name accumulators

我尝试添加以下环境变量PYTHONPATH,该变量指向spark / python目录,基于Stackoverflow中的答案importing pyspark in python shell

但这没有任何帮助

4 个答案:

答案 0 :(得分:6)

这对我有用:

import os
import sys

spark_path = "D:\spark"

os.environ['SPARK_HOME'] = spark_path
os.environ['HADOOP_HOME'] = spark_path

sys.path.append(spark_path + "/bin")
sys.path.append(spark_path + "/python")
sys.path.append(spark_path + "/python/pyspark/")
sys.path.append(spark_path + "/python/lib")
sys.path.append(spark_path + "/python/lib/pyspark.zip")
sys.path.append(spark_path + "/python/lib/py4j-0.9-src.zip")

from pyspark import SparkContext
from pyspark import SparkConf

sc = SparkContext("local", "test")

验证:

In [2]: sc
Out[2]: <pyspark.context.SparkContext at 0x707ccf8>

答案 1 :(得分:0)

2018版本

在Windows 10上安装PYSPARK 带有ANACONDA NAVIGATOR的JUPYTER-NOTEBOOK

步骤1

下载软件包

1)spark-2.2.0-bin-hadoop2.7.tgz Download

2)java jdk 8版本Download

3)Anaconda v 5.2 Download

4)scala-2.12.6.msi Download

5)hadoop v2.7.1 Download

STEP 2

C:/ 中制作火花文件夹,驱动并放入其中的所有内容 It will look like this

注意:在安装SCALA时将SCALA放入火花文件夹内的路径

步骤3

现在设置新的Windows环境变量

  1. HADOOP_HOME=C:\spark\hadoop

  2. JAVA_HOME=C:\Program Files\Java\jdk1.8.0_151

  3. SCALA_HOME=C:\spark\scala\bin

  4. SPARK_HOME=C:\spark\spark\bin

  5. PYSPARK_PYTHON=C:\Users\user\Anaconda3\python.exe

  6. PYSPARK_DRIVER_PYTHON=C:\Users\user\Anaconda3\Scripts\jupyter.exe

  7. PYSPARK_DRIVER_PYTHON_OPTS=notebook

  8. 立即选择火花路径

    点击编辑并添加

    在变量“ Path” Windows中添加“ C:\ spark \ spark \ bin

步骤4

  • 在要存储Jupyter-Notebook输出和文件的文件夹中创建
  • 之后,打开Anaconda命令提示符,并 cd文件夹名称
  • 然后输入 Pyspark

就这样,您的浏览器将使用Juypter localhost弹出

步骤5

检查pyspark是否正常工作!

输入简单代码并运行

from pyspark.sql import Row
a = Row(name = 'Vinay' , age=22 , height=165)
print("a: ",a)

答案 2 :(得分:0)

在Jupyter笔记本中运行pySpark-Windows

JAVA8:https://www.guru99.com/install-java.html

Anakonda:https://www.anaconda.com/distribution/

jupyter中的Pyspark:https://changhsinlee.com/install-pyspark-windows-jupyter/

<div *ngIf="dataDisplay && !isLoading">
    <div class="row" *ngIf="!error">
        <div class="col-md-4">
            <div class="card">
                <div class="card-header card-header-icon card-header-rose">
                  <div class="card-icon">
                    <i class="material-icons">insert_chart</i>
                  </div>
                  <h4 class="card-title">Employee Band</h4>
                </div>
                <div class="card-body">
                  <app-commo-doughnut-chart
                    [doughnutChartLabels]="ChartLabelEmpBand"
                    [doughnutChartData]="ChartLabelDataEmpBand"
                    [chartFilter]="empBandFilter"
                    [labelIds]="ChartLabelIdEmpBand"
                    [attritionCount]="ChartLabelEmpBandAttritionCount"
                  ></app-commo-doughnut-chart>
                </div>
            </div>
        </div>
    </div>
</div>

答案 3 :(得分:0)

要从jupyter Notebook运行pyspark,我们需要首先安装findspark软件包。通过在anaconda提示符下运行以下命令来尝试安装

  

conda install -c conda-forge findspark

有关在Jupyter Notebook中逐步安装pyspark的信息,请参阅以下URL https://learntospark.blogspot.com/2019/12/configure-pyspark-with-jupyter-notebook.html