无法使python代码在基于Windows的开发平台上运行,以连接到在同一Windows开发平台上的Docker中运行的独立Spark集群。
如果无法执行此操作,则可以选择不从Windows计算机调试Python代码;在运行Linux的内部Docker容器中测试代码,方法是将代码保存在可由Docker容器访问的持久卷中,然后从那里提交/运行作业以进行测试。坦白地说,这可能是在生产环境中构造代码的最佳方法,这可能消除了解决Windows网络噩梦的需要。我需要一个支持Azure云中运行代码的体系结构,因此我可能需要切换到群集模式 为此,甚至可以使用“群集模式”作为默认模式。我确实认为在集群模式下运行时使用Python(Pyspark)可能存在问题,因为我目前不支持。
Docker-Compose文件的相关部分:
spark-master:
image: appx_clean_spark:latest
container_name: spark-master
depends_on:
- redis
hostname: spark-master
ports:
- "4040:4040"
- "8080:8080"
- "7077:7077"
- "51400:51400"
- "51500:51500"
networks:
- gateway
environment:
MASTER: "spark://spark-master:7077"
SPARK_PUBLIC_DNS: "172.18.0.4"
# Added new below
SPARK_MASTER_PORT: "7077"
SPARK_MASTER_WEBUI_PORT: "8080"
SPARK_DRIVER_PORT: "7001"
SPARK_BROADCAST_FACTORY: "=org.apache.spark.broadcast.HttpBroadcastFactory"
SPARK_MASTER_OPTS: >-
-Dspark.ui.port=4040
-Dspark.fileserver.port=7002
-Dspark.broadcast.port=7003
-Dspark.replClassServer.port=7004
-Dspark.blockManager.port=7005
-Dspark.executor.port=7006
# Added new above
expose:
- "4040"
- "7001"
- "7002"
- "7003"
- "7004"
- "7005"
- "7006"
- "7077"
- "8080"
- "8888"
- "51400"
- "51500"
command: "/start-master.sh"
# restart: always
spark-worker:
image: appx_clean_spark:latest
container_name: spark-worker
depends_on:
- spark-master
ports:
- "7005:7005"
- "7006:7006"
- "8081:8081"
- "6066:6066"
networks:
- gateway
environment:
SPARK_PUBLIC_DNS: "172.18.0.4"
SPARK_MASTER: "spark://spark-master:7077"
# Added below
SPARK_WORKER_CORES: "4"
SPARK_WORKER_MEMORY: "1g"
SPARK_WORKER_PORT: "6066"
SPARK_WORKER_WEBUI_PORT: "8081"
SPARK_DRIVER_PORT: "7001"
SPARK_BROADCAST_FACTORY: "org.apache.spark.broadcast.HttpBroadcastFactory"
SPARK_WORKER_OPTS: >-
-Dspark.ui.port=4040
-Dspark.fileserver.port=7002
-Dspark.broadcast.port=7003
-Dspark.replClassServer.port=7004
-Dspark.blockManager.port=7005
-Dspark.executor.port=7006
# Added above
expose:
- "4040"
- "6066"
- "7001"
- "7002"
- "7003"
- "7004"
- "7005"
- "7006"
- "8081"
- "8888"
command: "/start-worker.sh"
# restart: always
networks:
gateway:
driver: bridge
ipam:
driver: default
我希望能够仅更改几行代码就可以从Visual Studio Code中连接到本地独立Spark-Docker群集或Azure Spark群集。
我正在尝试先使客户端模式工作,然后使群集模式工作(如果Pyspark支持)。