在Azure HDIinsight群集中指定--files时,Spark提交在纱线群集模式下失败

时间:2020-01-29 18:09:10

标签: apache-spark pyspark azure-hdinsight

在纱线簇模式下火花提交失败,但在客户端模式下成功提交

火花提交:

spark-submit 
--master yarn --deploy-mode cluster \
--py-files packages.zip,deps2.zip \
--files /home/sshsanjeev/git/pyspark-example-demo/configs/etl_config.json \
jobs/etl_job.py

Error stack:

Traceback (most recent call last):
  File "etl_job.py", line 51, in <module>
    main()
  File "etl_job.py", line 11, in main
    app_name='my_etl_job',spark_config={'spark.sql.shuffle.partitions':2})
  File "/mnt/resource/hadoop/yarn/local/usercache/sshsanjeev/appcache/application_1555349704365_0218/container_1555349704365_0218_01_000001/packages.zip/dependencies/spark_conn.py", line 20, in start_spark
  File "/usr/hdp/current/spark2-client/python/pyspark/context.py", line 891, in addFile
    self._jsc.sc().addFile(path, recursive)
  File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o204.addFile.
: java.io.FileNotFoundException: File file:/mnt/resource/hadoop/yarn/local/usercache/sshsanjeev/appcache/application_1555349704365_0218/container_1555349704365_0218_01_000001/configs/etl_config.json does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:624)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:850)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:614)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1529)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

进行了几次在线搜索。紧随本文https://community.cloudera.com/t5/Support-Questions/Spark-job-fails-in-cluster-mode/td-p/58772,但问题仍未解决。

请注意,我已经尝试了两种方法,方法是将配置文件放置在Namenode的本地路径以及HDFS目录中,但仍然遇到相同的错误。同样在客户端模式下,它可以成功运行。需要指导

这是我的HDP群集的堆栈版本

HDP-2.6.5.3008 纱2.7.3 Spark2 2.3.2

让我知道是否需要进一步的信息。任何建议将不胜感激。

1 个答案:

答案 0 :(得分:0)

这可能与无法创建目录的权限问题有关。如果未创建目录,则它将没有占位符来放置中间结果。因此,它失败了。引用为/mnt/resource/hadoop/yarn/local/usercache/<username>/appcache/<applicationID>的目录用于存储中间结果,然后根据是否将其写入路径或分别存储在临时表中而转到HDFS /内存。用户可能没有权限。作业完成后,它将被清空。在特定工作节点中的路径/mnt/resource/hadoop/yarn/local/user高速缓存中向用户提供正确的权限应该可以解决此问题。