Spark提交不选择项目结构的模块和子模块

时间:2019-11-14 06:40:16

标签: pyspark pycharm spark-submit

pycharm上pyspark项目的文件夹结构:

TEST
    TEST (marked as sources root)
        com
            earl
                test
                    pysprk
                        utils
                            utilities.py
                        test_main.py

test_main.py具有:

from _ast import arg

__author__ = "earl"
from pyspark.sql.functions import to_json, struct, lit
from com.earl.test.pyspark.utils.utilities import *
import sys

utilities.py具有:

__author__ = "earl"

from py4j.protocol import Py4JJavaError
from pyspark.sql import SparkSession
import sys

在PyCharm上,我通过运行test_main.py执行代码,该代码绝对可以正常工作。从utilities.py调用函数并完美执行。我在PyCharm上将Run -> Edit Configurations -> Parameters设置为D:\Users\input\test.json localhost:9092,并分别使用sys.argv[1]sys.argv[2]

火花提交命令:

spark-submit --master local --conf spark.sparkContext.setLogLevel=WARN --name test D:\Users\earl\com\earl\test\pyspark\test_main.py --files D:\Users\earl\com\test\pyspark\utils\utilities.py D:\Users\input\test.json localhost:9092

错误:

Traceback (most recent call last):
  File "D:\Users\earl\com\earl\test\pyspark\test_main.py", line 5, in <module>
    from com.earl.test.pyspark.utils.utilities import *
ModuleNotFoundError: No module named 'com'

1 个答案:

答案 0 :(得分:0)

在运行spark-submit之前,通过在属性下面设置来修复它。

之前的PYTHONPATH设置为onClick={()=> window.open("someLink", "_blank")}

%PY_HOME%\Lib;%PY_HOME%\DLLs;%PY_HOME%\Lib\lib-tk

并将set PYTHONPATH=%PYTHONPATH%;D:\Users\earl\TEST\ (Path of the project home structure) 更新为(仅需提及main):

spark-submit