我想从Jupyter笔记本运行pySpark。我下载并安装了有Juptyer的Anaconda。我创建了以下行
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("My App")
sc = SparkContext(conf = conf)
我收到以下错误
ImportError Traceback (most recent call last)
<ipython-input-3-98c83f0bd5ff> in <module>()
----> 1 from pyspark import SparkConf, SparkContext
2 conf = SparkConf().setMaster("local").setAppName("My App")
3 sc = SparkContext(conf = conf)
C:\software\spark\spark-1.6.2-bin-hadoop2.6\python\pyspark\__init__.py in <module>()
39
40 from pyspark.conf import SparkConf
---> 41 from pyspark.context import SparkContext
42 from pyspark.rdd import RDD
43 from pyspark.files import SparkFiles
C:\software\spark\spark-1.6.2-bin-hadoop2.6\python\pyspark\context.py in <module>()
26 from tempfile import NamedTemporaryFile
27
---> 28 from pyspark import accumulators
29 from pyspark.accumulators import Accumulator
30 from pyspark.broadcast import Broadcast
ImportError: cannot import name accumulators
我尝试添加以下环境变量PYTHONPATH,该变量指向spark / python目录,基于Stackoverflow中的答案importing pyspark in python shell
但这没有任何帮助
答案 0 :(得分:6)
这对我有用:
import os
import sys
spark_path = "D:\spark"
os.environ['SPARK_HOME'] = spark_path
os.environ['HADOOP_HOME'] = spark_path
sys.path.append(spark_path + "/bin")
sys.path.append(spark_path + "/python")
sys.path.append(spark_path + "/python/pyspark/")
sys.path.append(spark_path + "/python/lib")
sys.path.append(spark_path + "/python/lib/pyspark.zip")
sys.path.append(spark_path + "/python/lib/py4j-0.9-src.zip")
from pyspark import SparkContext
from pyspark import SparkConf
sc = SparkContext("local", "test")
验证:
In [2]: sc
Out[2]: <pyspark.context.SparkContext at 0x707ccf8>
答案 1 :(得分:0)
2018版本
在Windows 10上安装PYSPARK 带有ANACONDA NAVIGATOR的JUPYTER-NOTEBOOK
下载软件包
1)spark-2.2.0-bin-hadoop2.7.tgz Download
2)java jdk 8版本Download
3)Anaconda v 5.2 Download
4)scala-2.12.6.msi Download
5)hadoop v2.7.1 Download
在 C:/ 中制作火花文件夹,驱动并放入其中的所有内容 It will look like this
注意:在安装SCALA时将SCALA放入火花文件夹内的路径
现在设置新的Windows环境变量
HADOOP_HOME=C:\spark\hadoop
JAVA_HOME=C:\Program Files\Java\jdk1.8.0_151
SCALA_HOME=C:\spark\scala\bin
SPARK_HOME=C:\spark\spark\bin
PYSPARK_PYTHON=C:\Users\user\Anaconda3\python.exe
PYSPARK_DRIVER_PYTHON=C:\Users\user\Anaconda3\Scripts\jupyter.exe
PYSPARK_DRIVER_PYTHON_OPTS=notebook
立即选择火花路径:
点击编辑并添加
在变量“ Path” Windows中添加“ C:\ spark \ spark \ bin ”
就这样,您的浏览器将使用Juypter localhost弹出
检查pyspark是否正常工作!
输入简单代码并运行
from pyspark.sql import Row
a = Row(name = 'Vinay' , age=22 , height=165)
print("a: ",a)
答案 2 :(得分:0)
在Jupyter笔记本中运行pySpark-Windows
JAVA8:https://www.guru99.com/install-java.html
Anakonda:https://www.anaconda.com/distribution/
jupyter中的Pyspark:https://changhsinlee.com/install-pyspark-windows-jupyter/
<div *ngIf="dataDisplay && !isLoading">
<div class="row" *ngIf="!error">
<div class="col-md-4">
<div class="card">
<div class="card-header card-header-icon card-header-rose">
<div class="card-icon">
<i class="material-icons">insert_chart</i>
</div>
<h4 class="card-title">Employee Band</h4>
</div>
<div class="card-body">
<app-commo-doughnut-chart
[doughnutChartLabels]="ChartLabelEmpBand"
[doughnutChartData]="ChartLabelDataEmpBand"
[chartFilter]="empBandFilter"
[labelIds]="ChartLabelIdEmpBand"
[attritionCount]="ChartLabelEmpBandAttritionCount"
></app-commo-doughnut-chart>
</div>
</div>
</div>
</div>
</div>
答案 3 :(得分:0)
要从jupyter Notebook运行pyspark,我们需要首先安装findspark软件包。通过在anaconda提示符下运行以下命令来尝试安装
conda install -c conda-forge findspark
有关在Jupyter Notebook中逐步安装pyspark的信息,请参阅以下URL https://learntospark.blogspot.com/2019/12/configure-pyspark-with-jupyter-notebook.html