AWS EMR 日志记录 pyspark

时间:2021-04-28 10:51:24

标签: apache-spark hadoop pyspark log4j amazon-emr

我正在尝试使用命令在 EMR 中运行 Pyspark 代码

spark-submit --master yarn --conf spark.yarn.submit.waitAppCompletion=true --deploy-mode cluster --packages org.mongodb.spark:mongo-spark-connector_2.12:3.0.1 --py-files s3://logsetl--emr/py-dist/jobs.zip,s3://logsetl--emr/py-dist/shared.zip,s3://logsetl--emr/py-dist/libs.zip,s3://logsetl--emr/py-dist/schema.zip --files s3://logsetl--emr/py-dist/config.json s3://logsetl--emr/py-dist/main.py --job cdn --start_date '2021-04-14' --end_date '2021-04-14'

我没有提供任何依赖包,它在我的本地机器上运行良好,但在 EMR 集群上出现错误

[2021-04-28 09:47:33.592]Container exited with a non-zero exit code 13. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/filecache/19/__spark_libs__3939163610008988309.zip/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
21/04/28 09:47:30 INFO SignalUtils: Registered signal handler for TERM
21/04/28 09:47:30 INFO SignalUtils: Registered signal handler for HUP
21/04/28 09:47:30 INFO SignalUtils: Registered signal handler for INT
21/04/28 09:47:31 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1619603156108_0001_000002
21/04/28 09:47:31 INFO ApplicationMaster: Starting the user application in a separate Thread
21/04/28 09:47:31 INFO ApplicationMaster: Waiting for spark context initialization...
21/04/28 09:47:32 ERROR ApplicationMaster: User application exited with status 1
21/04/28 09:47:32 INFO ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: User application exited with status 1)
21/04/28 09:47:32 ERROR ApplicationMaster: Uncaught exception: 
org.apache.spark.SparkException: Exception thrown in awaitResult: 
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:302)
    at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:500)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:264)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:890)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:889)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:889)
    at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: org.apache.spark.SparkUserAppException: User application exited with 1
    at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:111)
    at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:728)
21/04/28 09:47:32 INFO ApplicationMaster: Deleting staging directory hdfs://ip-10-15-32-67.ec2.internal:8020/user/hadoop/.sparkStaging/application_1619603156108_0001
21/04/28 09:47:33 INFO ShutdownHookManager: Shutdown hook called

我不确定出了什么问题,是否缺少任何依赖项。

0 个答案:

没有答案