我创建了一个包,我想传递给每个执行者节点

时间:2019-04-05 14:28:35

标签: python python-3.x apache-spark pyspark

我创建了一个python软件包,该软件包正在我的主python文件中使用,它将使用spark-submit在整个纱线簇上运行。这是我遵循的步骤。

1) Suppose i have package name auditing. auditing has subpackage name abc_pkg_1,abc_pckg_2
2) I have main file test.py where i am using that package
3) I have created egg file for the auditing package using setup.py outside the package.
4) I ran spark-submit with --py-files dist/auditing-0.0.1-py3.6.egg

setup.py(用于鸡蛋文件)

from setuptools import setup, find_packages

setup(
    name="auditing",
    version="0.0.1",
    author="Example Author",
    packages=find_packages()
)

test.py:

from auditing import Driver

在纱线记录中出现错误:

ModuleNotFoundError: No module named 'auditing'

创建鸡蛋文件的命令:

python3 setup.py bdist_egg

即使在pyspark shell中也无法正常工作。找不到相同的模块错误

0 个答案:

没有答案