Question

我有一个小脚本，我通过AWS提交工作。我已将实例类型从m3xlarge更改为m4.xlarge，我突然收到错误消息，并且群集终止而未完成所有步骤。脚本是：

aws emr create-cluster --name “XXXXXX”  --ami-version 3.7 --applications Name=Hive --use-default-roles --ec2-attributes KeyName=gattami,SubnetId=subnet-xxxxxxx \
--instance-type=m4.xlarge --instance-count 3 \
--log-uri s3://pythonpicode/ --bootstrap-actions Path=s3://eu-central-1.support.elasticmapreduce/spark/install-spark,Name=Spark,Args=[-x] --steps Name=“PythonPi”,Jar=s3://eu-central-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[/home/hadoop/spark/bin/spark-submit,--deploy-mode,cluster,--master,yarn,--class,s3://pythonpicode/,s3://pythonpicode/PythonPi.py],ActionOnFailure=CONTINUE --auto-terminate

我收到的错误消息是

Exception in thread "main" java.lang.IllegalArgumentException: Unknown/unsupported param List(--executor-cores, , --files, s3://pythonpicode/PythonPi.py, --primary-py-file, PythonPi.py, --class, org.apache.spark.deploy.PythonRunner)

Usage: org.apache.spark.deploy.yarn.Client [options]
Options:
  --jar JAR_PATH           Path to your application's JAR file (required in yarn-cluster
                           mode)
  --class CLASS_NAME       Name of your application's main class (required)
  --primary-py-file        A main Python file
  --arg ARG                Argument to be passed to your application's main class.
                           Multiple invocations are possible, each will be passed in order.
  --num-executors NUM      Number of executors to start (Default: 2)
  --executor-cores NUM     Number of cores per executor (Default: 1).
  --driver-memory MEM      Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb)
  --driver-cores NUM       Number of cores used by the driver (Default: 1).
  --executor-memory MEM    Memory per executor (e.g. 1000M, 2G) (Default: 1G)
  --name NAME              The name of your application (Default: Spark)
  --queue QUEUE            The hadoop queue to use for allocation requests (Default:
                           'default')
  --addJars jars           Comma separated list of local jars that want SparkContext.addJar
                           to work with.
  --py-files PY_FILES      Comma-separated list of .zip, .egg, or .py files to
                           place on the PYTHONPATH for Python apps.
  --files files            Comma separated list of files to be distributed with the job.
  --archives archives      Comma separated list of archives to be distributed with the job.

    at org.apache.spark.deploy.yarn.ClientArguments.parseArgs(ClientArguments.scala:228)
    at org.apache.spark.deploy.yarn.ClientArguments.<init>(ClientArguments.scala:56)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:646)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Command exiting with ret ‘1'

我也尝试了以下替代

aws emr create-cluster --name "XXXXXXX"  --release-label emr-4.7.2 --applications Name=Spark --ec2-attributes KeyName=xxxxxxx,SubnetId=subnet-xxxxxxxx \
--instance-type=m4.xlarge  --instance-count 3 \
--log-uri s3://pythonpicode/ --steps Type=CUSTOM_JAR,Name="PythonPi",Jar="command-runner.jar",ActionOnFailure=CONTINUE,Args=[spark-submit,--master,yarn,--deploy-mode,cluster,s3://pythonpicode/PythonPi.py] --use-default-roles --auto-terminate

我从步骤中得到的（PARTIAL）错误消息如下

16/08/24 11:57:39 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:40 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:41 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:42 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:43 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:44 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:45 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:46 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:47 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:48 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:49 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:50 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:51 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:52 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:53 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:54 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:55 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:56 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:57 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:58 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:59 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:58:00 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:58:01 INFO Client: Application report for application_1472039667248_0001 (state: FAILED)
16/08/24 11:58:01 INFO Client: 
     client token: N/A
     diagnostics: Application application_1472039667248_0001 failed 2 times due to AM Container for appattempt_1472039667248_0001_000002 exited with  exitCode: -104
For more detailed output, check application tracking page:http://ip-172-31-21-32.eu-central-1.compute.internal:8088/cluster/app/application_1472039667248_0001Then, click on links to logs of each attempt.
Diagnostics: Container [pid=5713,containerID=container_1472039667248_0001_02_000001] is running beyond physical memory limits. Current usage: 2.0 GB of 1.4 GB physical memory used; 3.3 GB of 6.9 GB virtual memory used. Killing container.
Dump of the process-tree for container_1472039667248_0001_02_000001 :
    |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
    |- 5748 5721 5713 5713 (python) 301 29 1343983616 246463 python PythonPi.py 
    |- 5721 5713 5713 5713 (java) 1594 93 2031308800 265175 /usr/lib/jvm/java-openjdk/bin/java -server -Xmx1024m -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1472039667248_0001/container_1472039667248_0001_02_000001/tmp -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1472039667248_0001/container_1472039667248_0001_02_000001 -XX:MaxPermSize=256m org.apache.spark.deploy.yarn.ApplicationMaster --class org.apache.spark.deploy.PythonRunner --primary-py-file PythonPi.py --executor-memory 5120m --executor-cores 4 --properties-file /mnt/yarn/usercache/hadoop/appcache/application_1472039667248_0001/container_1472039667248_0001_02_000001/__spark_conf__/__spark_conf__.properties 
    |- 5713 5711 5713 5713 (bash) 0 0 115810304 715 /bin/bash -c LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native /usr/lib/jvm/java-openjdk/bin/java -server -Xmx1024m -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1472039667248_0001/container_1472039667248_0001_02_000001/tmp '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1472039667248_0001/container_1472039667248_0001_02_000001 -XX:MaxPermSize=256m org.apache.spark.deploy.yarn.ApplicationMaster --class 'org.apache.spark.deploy.PythonRunner' --primary-py-file PythonPi.py --executor-memory 5120m --executor-cores 4 --properties-file /mnt/yarn/usercache/hadoop/appcache/application_1472039667248_0001/container_1472039667248_0001_02_000001/__spark_conf__/__spark_conf__.properties 1> /var/log/hadoop-yarn/containers/application_1472039667248_0001/container_1472039667248_0001_02_000001/stdout 2> /var/log/hadoop-yarn/containers/application_1472039667248_0001/container_1472039667248_0001_02_000001/stderr 

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1472039815698
     final status: FAILED
     tracking URL: http://ip-172-31-21-32.eu-central-1.compute.internal:8088/cluster/app/application_1472039667248_0001
     user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1472039667248_0001 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/08/24 11:58:01 INFO ShutdownHookManager: Shutdown hook called
16/08/24 11:58:01 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-7adbbd9f-2f68-49e3-85e6-9fdf960af87e
Command exiting with ret '1'

Answer 1

您需要检查您的Spark版本。很可能你安装了一个不支持这些参数的旧版本（例如1.5）。

(--executor-cores, , --files, s3://pythonpicode/PythonPi.py, --primary-py-file, PythonPi.py, --class, org.apache.spark.deploy.PythonRunner)

我建议您尝试稳定的AMI 4.7.2，并将Spark 1.6作为标准应用程序提供。

在AWS上运行Spark而不是m3上的m3

1 个答案: