我正在Linux Mint的Jupyter笔记本上运行Python脚本。
代码并不是很重要,但是在这里(这是有关图框的教程):
import pandas
import pyspark
from functools import reduce
from graphframes import *
from IPython.display import display, HTML
from pyspark.context import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.functions import col, lit, when
from pyspark.sql.session import SparkSession
sc = SparkContext.getOrCreate()
sqlContext = SQLContext.getOrCreate(sc)
spark = SparkSession(sc)
vertices = sqlContext.createDataFrame(
[
("a", "Alice", 34),
("b", "Bob", 36),
("c", "Charlie", 30),
("d", "David", 29),
("e", "Esther", 32),
("f", "Fanny", 36),
("g", "Gabby", 60),
],
["id", "name", "age"],
)
edges = sqlContext.createDataFrame(
[
("a", "b", "friend"),
("b", "c", "follow"),
("c", "b", "follow"),
("f", "c", "follow"),
("e", "f", "follow"),
("e", "d", "friend"),
("d", "a", "friend"),
("a", "e", "friend"),
],
["src", "dst", "relationship"],
)
g = GraphFrame(vertices, edges)
display(g.inDegrees.toPandas())
最后一行是引起麻烦的行,它给出以下错误:
Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
正确设置了以下两个变量 :
printenv PYSPARK_PYTHON
-> /usr/bin/python3
printenv PYSPARK_DRIVER_PYTHON
-> /usr/bin/python3
我还按如下方式将它们添加到了spark-env.sh
文件中:
# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.
export PYSPARK_PYTHON=/usr/bin/python3
export PYSPARK_DRIVER_PYTHON=/usr/bin/python3
但是错误仍然存在,我还必须在哪里更新这些变量?
编辑
python --version
Python 3.7.4
pip3 list | grep jupyter
jupyter 1.0.0
jupyter-client 5.3.4
jupyter-console 6.0.0
jupyter-core 4.6.1
jupyterlab 1.1.4
jupyterlab-server 1.0.6
pip3 list | grep pyspark
pyspark 2.4.4
答案 0 :(得分:1)
问题很可能是python版本冲突。将PYSPARK_PYTHON
和PYSPARK_DRIVER_PYTHON
设置为/usr/bin/python
。
或者,您可以使用venv
cd ~
python3 -m venv spark_test
cd spark_test
source ./bin/activate
pip3 install jupyterlab pyspark graphframes
jupyter notebook
您必须将jupyter文件放入新创建的文件夹中。