我有一个小脚本,我通过AWS提交工作。我已将实例类型从m3xlarge更改为m4.xlarge,我突然收到错误消息,并且群集终止而未完成所有步骤。脚本是:
aws emr create-cluster --name “XXXXXX” --ami-version 3.7 --applications Name=Hive --use-default-roles --ec2-attributes KeyName=gattami,SubnetId=subnet-xxxxxxx \
--instance-type=m4.xlarge --instance-count 3 \
--log-uri s3://pythonpicode/ --bootstrap-actions Path=s3://eu-central-1.support.elasticmapreduce/spark/install-spark,Name=Spark,Args=[-x] --steps Name=“PythonPi”,Jar=s3://eu-central-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[/home/hadoop/spark/bin/spark-submit,--deploy-mode,cluster,--master,yarn,--class,s3://pythonpicode/,s3://pythonpicode/PythonPi.py],ActionOnFailure=CONTINUE --auto-terminate
我收到的错误消息是
Exception in thread "main" java.lang.IllegalArgumentException: Unknown/unsupported param List(--executor-cores, , --files, s3://pythonpicode/PythonPi.py, --primary-py-file, PythonPi.py, --class, org.apache.spark.deploy.PythonRunner)
Usage: org.apache.spark.deploy.yarn.Client [options]
Options:
--jar JAR_PATH Path to your application's JAR file (required in yarn-cluster
mode)
--class CLASS_NAME Name of your application's main class (required)
--primary-py-file A main Python file
--arg ARG Argument to be passed to your application's main class.
Multiple invocations are possible, each will be passed in order.
--num-executors NUM Number of executors to start (Default: 2)
--executor-cores NUM Number of cores per executor (Default: 1).
--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb)
--driver-cores NUM Number of cores used by the driver (Default: 1).
--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G)
--name NAME The name of your application (Default: Spark)
--queue QUEUE The hadoop queue to use for allocation requests (Default:
'default')
--addJars jars Comma separated list of local jars that want SparkContext.addJar
to work with.
--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to
place on the PYTHONPATH for Python apps.
--files files Comma separated list of files to be distributed with the job.
--archives archives Comma separated list of archives to be distributed with the job.
at org.apache.spark.deploy.yarn.ClientArguments.parseArgs(ClientArguments.scala:228)
at org.apache.spark.deploy.yarn.ClientArguments.<init>(ClientArguments.scala:56)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:646)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Command exiting with ret ‘1'
我也尝试了以下替代
aws emr create-cluster --name "XXXXXXX" --release-label emr-4.7.2 --applications Name=Spark --ec2-attributes KeyName=xxxxxxx,SubnetId=subnet-xxxxxxxx \
--instance-type=m4.xlarge --instance-count 3 \
--log-uri s3://pythonpicode/ --steps Type=CUSTOM_JAR,Name="PythonPi",Jar="command-runner.jar",ActionOnFailure=CONTINUE,Args=[spark-submit,--master,yarn,--deploy-mode,cluster,s3://pythonpicode/PythonPi.py] --use-default-roles --auto-terminate
我从步骤中得到的(PARTIAL)错误消息如下
16/08/24 11:57:39 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:40 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:41 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:42 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:43 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:44 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:45 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:46 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:47 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:48 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:49 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:50 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:51 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:52 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:53 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:54 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:55 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:56 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:57 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:58 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:59 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:58:00 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:58:01 INFO Client: Application report for application_1472039667248_0001 (state: FAILED)
16/08/24 11:58:01 INFO Client:
client token: N/A
diagnostics: Application application_1472039667248_0001 failed 2 times due to AM Container for appattempt_1472039667248_0001_000002 exited with exitCode: -104
For more detailed output, check application tracking page:http://ip-172-31-21-32.eu-central-1.compute.internal:8088/cluster/app/application_1472039667248_0001Then, click on links to logs of each attempt.
Diagnostics: Container [pid=5713,containerID=container_1472039667248_0001_02_000001] is running beyond physical memory limits. Current usage: 2.0 GB of 1.4 GB physical memory used; 3.3 GB of 6.9 GB virtual memory used. Killing container.
Dump of the process-tree for container_1472039667248_0001_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 5748 5721 5713 5713 (python) 301 29 1343983616 246463 python PythonPi.py
|- 5721 5713 5713 5713 (java) 1594 93 2031308800 265175 /usr/lib/jvm/java-openjdk/bin/java -server -Xmx1024m -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1472039667248_0001/container_1472039667248_0001_02_000001/tmp -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1472039667248_0001/container_1472039667248_0001_02_000001 -XX:MaxPermSize=256m org.apache.spark.deploy.yarn.ApplicationMaster --class org.apache.spark.deploy.PythonRunner --primary-py-file PythonPi.py --executor-memory 5120m --executor-cores 4 --properties-file /mnt/yarn/usercache/hadoop/appcache/application_1472039667248_0001/container_1472039667248_0001_02_000001/__spark_conf__/__spark_conf__.properties
|- 5713 5711 5713 5713 (bash) 0 0 115810304 715 /bin/bash -c LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native /usr/lib/jvm/java-openjdk/bin/java -server -Xmx1024m -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1472039667248_0001/container_1472039667248_0001_02_000001/tmp '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1472039667248_0001/container_1472039667248_0001_02_000001 -XX:MaxPermSize=256m org.apache.spark.deploy.yarn.ApplicationMaster --class 'org.apache.spark.deploy.PythonRunner' --primary-py-file PythonPi.py --executor-memory 5120m --executor-cores 4 --properties-file /mnt/yarn/usercache/hadoop/appcache/application_1472039667248_0001/container_1472039667248_0001_02_000001/__spark_conf__/__spark_conf__.properties 1> /var/log/hadoop-yarn/containers/application_1472039667248_0001/container_1472039667248_0001_02_000001/stdout 2> /var/log/hadoop-yarn/containers/application_1472039667248_0001/container_1472039667248_0001_02_000001/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1472039815698
final status: FAILED
tracking URL: http://ip-172-31-21-32.eu-central-1.compute.internal:8088/cluster/app/application_1472039667248_0001
user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1472039667248_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/08/24 11:58:01 INFO ShutdownHookManager: Shutdown hook called
16/08/24 11:58:01 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-7adbbd9f-2f68-49e3-85e6-9fdf960af87e
Command exiting with ret '1'
答案 0 :(得分:0)
您需要检查您的Spark版本。很可能你安装了一个不支持这些参数的旧版本(例如1.5)。
(--executor-cores, , --files, s3://pythonpicode/PythonPi.py, --primary-py-file, PythonPi.py, --class, org.apache.spark.deploy.PythonRunner)
我建议您尝试稳定的AMI 4.7.2,并将Spark 1.6作为标准应用程序提供。