pycharm上pyspark项目的文件夹结构:
TEST
TEST (marked as sources root)
com
earl
test
pysprk
utils
utilities.py
test_main.py
test_main.py具有:
from _ast import arg
__author__ = "earl"
from pyspark.sql.functions import to_json, struct, lit
from com.earl.test.pyspark.utils.utilities import *
import sys
utilities.py具有:
__author__ = "earl"
from py4j.protocol import Py4JJavaError
from pyspark.sql import SparkSession
import sys
在PyCharm上,我通过运行test_main.py
执行代码,该代码绝对可以正常工作。从utilities.py
调用函数并完美执行。我在PyCharm上将Run -> Edit Configurations -> Parameters
设置为D:\Users\input\test.json localhost:9092
,并分别使用sys.argv[1]
和sys.argv[2]
,
火花提交命令:
spark-submit --master local --conf spark.sparkContext.setLogLevel=WARN --name test D:\Users\earl\com\earl\test\pyspark\test_main.py --files D:\Users\earl\com\test\pyspark\utils\utilities.py D:\Users\input\test.json localhost:9092
错误:
Traceback (most recent call last):
File "D:\Users\earl\com\earl\test\pyspark\test_main.py", line 5, in <module>
from com.earl.test.pyspark.utils.utilities import *
ModuleNotFoundError: No module named 'com'
答案 0 :(得分:0)
在运行spark-submit之前,通过在属性下面设置来修复它。
之前的PYTHONPATH设置为onClick={()=> window.open("someLink", "_blank")}
%PY_HOME%\Lib;%PY_HOME%\DLLs;%PY_HOME%\Lib\lib-tk
并将set PYTHONPATH=%PYTHONPATH%;D:\Users\earl\TEST\ (Path of the project home structure)
更新为(仅需提及main):
spark-submit