Python虚拟环境[某些python库已从虚拟环境目录中安装]

时间:2019-08-05 03:45:11

标签: python python-3.x pyspark virtualenv

场景: 我们正在创建虚拟环境并安装所有require.txt文件,但在目录外部创建的文件很少。

用例: 我们想压缩该环境,并希望将其用于Spark驱动程序和执行程序

问题: 由于很少有文件从虚拟环境目录中安装,因此Spark失败,找不到模块异常或 lib * .so 文件不可用。

1 个答案:

答案 0 :(得分:0)

为解决此问题,我应用了某些步骤:

写博客: https://kshitij-kuls.com/2019/08/04/setting-up-virtual-environment-for-pyspark/

在继续之前,需要了解python的基本结构:

├── bin
│   ├── activate
│   ├── activate.csh
│   ├── activate.fish
│   ├── activate_this.py
│   ├── easy_install
│   ├── easy_install-3.6
│   ├── pip
│   ├── pip3
│   ├── pip3.6
│   ├── python
│   ├── python-config
│   ├── python3 -> python
│   ├── python3.6 -> python
│   └── wheel
├── include
│   └── python3.6m -> /usr/include/python3.6m
├── lib
│   └── python3.6
|       ├── site-packages
│       ├── lib-dynload -> /usr/lib/python3.6/lib-dynload [Dynamic Library]

环境变量:

PYSPARK_PYTHON : Points to the executable python file: bin/python

LD_LIBRARY_PATH : Points to the dynamic library path: lib/python3.6/lib-dynload [All .so* files]

PYTHONPATH:指向虚拟环境中已安装的软件包以及动态库路径:lib/python3.6/site-packages<CPS>lib/python3.6/lib-dynload [All .py files]

PYTHONHOME:指向python库路径:lib / python3.6 / site-packages

构建虚拟环境的步骤:

Install python in the machine of desired version.
Create Virtual Env
virtualenv env -p /usr/local/bin/python3
Activate Virtual Env
source env/bin/activate
Install requirements
pip install numpy

现在这是诀窍,您可以看到 线 ├── lib-dynload -> /usr/lib/python3.6/lib-dynload 这是一个符号链接,指向本地计算机路径,因此,即使您仅压缩此虚拟环境文件夹,群集上也将缺少这些依赖项。 因此,需要将所有.so *文件从/usr/lib/python3.6/lib-dynload/usr/lib64/*.so.*等复制到lib/python3.6/lib-dynload 将所有.py文件从/usr/lib/python3.6/lib-dynload/usr/lib64/*.so.*等复制到lib/python3.6/site-packages。 从虚拟环境的主目录运行它,在我们的例子中是env /

Prepare zip
zip -rq ../venv.zip *
Upload the zip to the /udf folder for tdss: /tookitaki/tdss/udf/

环境变量设置

对于驱动程序:spark.yarn.appMasterEnv.[Environment variable]

对于执行人:spark.executorEnv.[Environment variable]

PYSPARK_PYTHON

pyspark.spark.yarn.appMasterEnv.PYSPARK_PYTHON = venv/bin/python pyspark.spark.executorEnv.PYSPARK_PYTHON = venv/bin/python

PYTHONHOME

pyspark.spark.yarn.appMasterEnv.PYTHONHOME = venv/lib64/python3.6/site-packages pyspark.spark.executorEnv.PYTHONHOME = venv/lib64/python3.6/site-packages

LD_LIBRARY_PATH

pyspark.spark.yarn.appMasterEnv.LD_LIBRARY_PATH = venv/lib64/python3.6/lib-dynload pyspark.spark.executorEnv.LD_LIBRARY_PATH = venv/lib64/python3.6/lib-dynload

PYTHONPATH

这个需要包含在YARN-ENV-ENTRIES中,它不是从spark配置中设置的。

PYTHONPATH = {{PWD}}/__venv__.zip<CPS>{{PWD}}/__py4j-0.10.7-src__.zip<CPS>venv/lib64/python3.6/site-packages<CPS>venv/lib64/python3.6/lib-dynload<CPS>

To run python cd venv

export PYTHONPATH=lib64/python3.6/site-packages:lib64/python3.6/lib-dynload/

export LD_LIBRARY_PATH=lib64/python3.6/lib-dynload

源bin /激活