spark.yarn.jars-py4j.protocol.Py4JError:调用None.None时发生错误。跟踪:

时间:2020-08-14 23:34:25

标签: apache-spark pyspark

我正在尝试使用spark2-submit on命令运行spark作业。群集上安装的spark的版本是cloudera的spark2.1.0,我使用conf spark.yarn.jars指定2.4.0版的jar,如下所示-

spark2-submit \
 --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=/virtualenv/path/bin/python \
 --conf spark.yarn.jars=hdfs:///some/path/spark24/*\
 --conf spark.yarn.maxAppAttempts=1\
 --conf spark.task.cpus=2\
 --executor-cores 2\
 --executor-memory 4g\
 --driver-memory 4g\
 --archives /virtualenv/path \
 --files /etc/hive/conf/hive-site.xml \
 --name my_app\
  test.py

这是我在test.py中拥有的代码-

import os
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

print("Spark Session created")

在运行Submit命令时,我看到如下消息-

yarn.Client: Source and destination file systems are the same. Not copying hdfs:///some/path/spark24/some.jar

然后在创建spark会话的行上收到此错误-

spark = SparkSession.builder.getOrCreate()
  File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/sql/session.py", line 169, in getOrCreate
  File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 310, in getOrCreate
  File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 115, in __init__
  File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 259, in _ensure_initialized
  File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/java_gateway.py", line 117, in launch_gateway
  File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 175, in java_import
  File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 323, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling None.None. Trace:
Authentication error: unexpected command.

错误中的py4j来自现有的火花,而不是我jar中的版本。我的spark24罐子没捡到吗?如果我删除了jars的conf,同样的代码可以正常运行,但是可能是从现有的spark版本2.1.0中删除的。有关如何解决此问题的任何线索? 谢谢。

1 个答案:

答案 0 :(得分:0)

原来的问题是python从错误的位置运行。我必须以这种方式从正确的地方提交-

PYTHONPATH =。/ $ {virtualenv} /venv/lib/python3.6/site-packages/ spark2-submit