工作程序中的Python版本不同:环境变量设置正确

时间:2019-10-25 10:13:08

标签: python python-3.x apache-spark pyspark

我正在Linux Mint的Jupyter笔记本上运行Python脚本。

代码并不是很重要,但是在这里(这是有关图框的教程):

import pandas
import pyspark

from functools import reduce
from graphframes import *
from IPython.display import display, HTML
from pyspark.context import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.functions import col, lit, when
from pyspark.sql.session import SparkSession

sc = SparkContext.getOrCreate()
sqlContext = SQLContext.getOrCreate(sc)
spark = SparkSession(sc)

vertices = sqlContext.createDataFrame(
    [
        ("a", "Alice", 34),
        ("b", "Bob", 36),
        ("c", "Charlie", 30),
        ("d", "David", 29),
        ("e", "Esther", 32),
        ("f", "Fanny", 36),
        ("g", "Gabby", 60),
    ],
    ["id", "name", "age"],
)

edges = sqlContext.createDataFrame(
    [
        ("a", "b", "friend"),
        ("b", "c", "follow"),
        ("c", "b", "follow"),
        ("f", "c", "follow"),
        ("e", "f", "follow"),
        ("e", "d", "friend"),
        ("d", "a", "friend"),
        ("a", "e", "friend"),
    ],
    ["src", "dst", "relationship"],
)

g = GraphFrame(vertices, edges)

display(g.inDegrees.toPandas())

最后一行是引起麻烦的行,它给出以下错误:

Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

正确设置了以下两个变量

printenv PYSPARK_PYTHON
-> /usr/bin/python3
printenv PYSPARK_DRIVER_PYTHON
-> /usr/bin/python3

我还按如下方式将它们添加到了spark-env.sh文件中:

# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.

export PYSPARK_PYTHON=/usr/bin/python3       
export PYSPARK_DRIVER_PYTHON=/usr/bin/python3   

但是错误仍然存​​在,我还必须在哪里更新这些变量?

编辑

python --version
Python 3.7.4

pip3 list | grep jupyter
jupyter               1.0.0      
jupyter-client        5.3.4      
jupyter-console       6.0.0      
jupyter-core          4.6.1      
jupyterlab            1.1.4      
jupyterlab-server     1.0.6     

pip3 list | grep pyspark
pyspark               2.4.4

1 个答案:

答案 0 :(得分:1)

问题很可能是python版本冲突。将PYSPARK_PYTHONPYSPARK_DRIVER_PYTHON设置为/usr/bin/python。 或者,您可以使用venv

cd ~
python3 -m venv spark_test
cd spark_test
source ./bin/activate
pip3 install jupyterlab pyspark graphframes
jupyter notebook

您必须将jupyter文件放入新创建的文件夹中。