我正在管理一个系统,它以客户端模式向YARN群集提交Spark应用程序。提交的Spark应用程序需要大约30秒才能在YARN群集上准备就绪,对我来说似乎有点慢。
有没有办法在客户端模式下缩短YARN群集上Spark应用程序的启动时间?
以下是我的环境规格: - Spark版本:2.1.1(实际版本为2.1.1.2.6.2.0-205,由Hortonworks提供) - YARN版本:2.7.3(实际版本为2.7.3.2.6.2.0-205,由Hortonworks提供) - RM数量:2(HA) - NM数量:300
以下是检查Spark应用程序启动时间的示例代码和结果:
$ /usr/hdp/current/spark2-client/bin/spark-shell -v --master yarn --deploy-mode client --driver-memory 4g --conf spark.default.parallelism=180 --conf spark.executor.cores=6 --conf spark.executor.instances=30 --conf spark.executor.memory=6g --conf spark.yarn.am.cores=4 --conf spark.yarn.containerLauncherMaxThreads=30 --conf spark.yarn.dist.archives=/usr/hdp/current/spark2-client/R/lib/sparkr.zip#sparkr --conf spark.yarn.dist.files=/etc/spark2/conf/hive-site.xml --proxy-user xxxx
Using properties file: /usr/hdp/current/spark2-client/conf/spark-defaults.conf
Adding default property: spark.history.kerberos.keytab=/etc/security/keytabs/spark.headless.keytab
Adding default property: spark.history.fs.logDirectory=hdfs:///spark2-history/
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
Adding default property: spark.yarn.queue=default
Adding default property: spark.yarn.historyServer.address=xxxx
Adding default property: spark.history.kerberos.principal=xxxx@xxxx
Adding default property: spark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider
Adding default property: spark.executor.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
Adding default property: spark.eventLog.dir=hdfs:///spark2-history/
Adding default property: spark.history.ui.port=18081
Adding default property: spark.history.kerberos.enabled=true
Parsed arguments:
master yarn
deployMode client
executorMemory 6g
executorCores 6
totalExecutorCores null
propertiesFile /usr/hdp/current/spark2-client/conf/spark-defaults.conf
driverMemory 4g
driverCores null
driverExtraClassPath null
driverExtraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
driverExtraJavaOptions null
supervise false
queue null
numExecutors 30
files null
pyFiles null
archives null
mainClass org.apache.spark.repl.Main
primaryResource spark-shell
name Spark shell
childArgs []
jars null
packages null
packagesExclusions null
repositories null
verbose true
Spark properties used, including those specified through
--conf and those from the properties file /usr/hdp/current/spark2-client/conf/spark-defaults.conf:
(spark.history.kerberos.enabled,true)
(spark.yarn.queue,default)
(spark.default.parallelism,180)
(spark.executor.extraLibraryPath,/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64)
(spark.history.kerberos.principal,xxxx@xxxx)
(spark.executor.memory,6g)
(spark.driver.memory,4g)
(spark.driver.extraLibraryPath,/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64)
(spark.executor.instances,30)
(spark.yarn.historyServer.address,xxxx)
(spark.eventLog.enabled,true)
(spark.yarn.dist.files,/etc/spark2/conf/hive-site.xml)
(spark.history.ui.port,18081)
(spark.history.provider,org.apache.spark.deploy.history.FsHistoryProvider)
(spark.history.fs.logDirectory,hdfs:///spark2-history/)
(spark.yarn.am.cores,4)
(spark.yarn.containerLauncherMaxThreads,30)
(spark.history.kerberos.keytab,/etc/security/keytabs/spark.headless.keytab)
(spark.yarn.dist.archives,/usr/hdp/current/spark2-client/R/lib/sparkr.zip#sparkr)
(spark.eventLog.dir,hdfs:///spark2-history/)
(spark.executor.cores,6)
Main class:
org.apache.spark.repl.Main
Arguments:
System properties:
(spark.yarn.queue,default)
(spark.history.kerberos.enabled,true)
(spark.default.parallelism,180)
(spark.executor.extraLibraryPath,/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64)
(spark.history.kerberos.principal,xxxx@xxxx)
(spark.driver.memory,4g)
(spark.executor.memory,6g)
(spark.executor.instances,30)
(spark.driver.extraLibraryPath,/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64)
(spark.yarn.historyServer.address,xxxx)
(spark.eventLog.enabled,true)
(spark.yarn.dist.files,file:/etc/spark2/conf/hive-site.xml)
(spark.history.ui.port,18081)
(SPARK_SUBMIT,true)
(spark.history.provider,org.apache.spark.deploy.history.FsHistoryProvider)
(spark.app.name,Spark shell)
(spark.history.fs.logDirectory,hdfs:///spark2-history/)
(spark.yarn.am.cores,4)
(spark.yarn.containerLauncherMaxThreads,30)
(spark.jars,)
(spark.history.kerberos.keytab,/etc/security/keytabs/spark.headless.keytab)
(spark.submit.deployMode,client)
(spark.yarn.dist.archives,file:/usr/hdp/current/spark2-client/R/lib/sparkr.zip#sparkr)
(spark.eventLog.dir,hdfs:///spark2-history/)
(spark.master,yarn)
(spark.executor.cores,6)
Classpath elements:
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://xxxx:4040
Spark context available as 'sc' (master = yarn, app id = application_1519898793082_6925).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.1.2.6.2.0-205
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
感谢。