我创建了一个python软件包,该软件包正在我的主python文件中使用,它将使用spark-submit在整个纱线簇上运行。这是我遵循的步骤。
1) Suppose i have package name auditing. auditing has subpackage name abc_pkg_1,abc_pckg_2
2) I have main file test.py where i am using that package
3) I have created egg file for the auditing package using setup.py outside the package.
4) I ran spark-submit with --py-files dist/auditing-0.0.1-py3.6.egg
setup.py(用于鸡蛋文件)
from setuptools import setup, find_packages
setup(
name="auditing",
version="0.0.1",
author="Example Author",
packages=find_packages()
)
test.py:
from auditing import Driver
在纱线记录中出现错误:
ModuleNotFoundError: No module named 'auditing'
创建鸡蛋文件的命令:
python3 setup.py bdist_egg
即使在pyspark shell中也无法正常工作。找不到相同的模块错误