我有一个火花作业,我正在使用spark-submit执行。每当我执行jar时,jar都会失败并出现错误java.lang.ArrayIndexOutOfBoundsException:1
这是完整的堆栈跟踪:
[hadoop@batch-cluster-master data]$ /usr/lib/spark/bin/spark-submit --master yarn --queue refault --driver-memory 12G --executor-memory 12G --executor-cores 3 --driver-cores 2 --class com.orgid.dp.batch.sql.BatchDriver /tmp/dp-batch-sql.jar /home/hadoop/PT_Data/batch-sql-ps-pathFinder-working.json
16/05/18 00:22:56 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:56 INFO spark.SparkContext: Running Spark version 1.6.0
16/05/18 00:22:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/18 00:22:56 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:56 INFO spark.SecurityManager: Changing view acls to: hadoop
16/05/18 00:22:56 INFO spark.SecurityManager: Changing modify acls to: hadoop
16/05/18 00:22:56 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
16/05/18 00:22:57 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:57 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:57 INFO util.Utils: Successfully started service 'sparkDriver' on port 37913.
16/05/18 00:22:57 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/05/18 00:22:57 INFO Remoting: Starting remoting
16/05/18 00:22:57 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.41.66.63:59598]
16/05/18 00:22:57 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 59598.
16/05/18 00:22:57 INFO spark.SparkEnv: Registering MapOutputTracker
16/05/18 00:22:57 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:57 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:57 INFO spark.SparkEnv: Registering BlockManagerMaster
16/05/18 00:22:57 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-56307d3d-6591-48bb-8bf8-f4989d71cd58
16/05/18 00:22:57 INFO storage.MemoryStore: MemoryStore started with capacity 8.4 GB
16/05/18 00:22:58 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/05/18 00:22:58 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/05/18 00:22:58 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/05/18 00:22:58 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/05/18 00:22:58 INFO ui.SparkUI: Started SparkUI at http://10.41.66.63:4040
16/05/18 00:22:58 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-301d676b-38f6-4895-8a04-af37c5b7fa99/httpd-0f747205-f207-476e-8317-6083d8fe0b37
16/05/18 00:22:58 INFO spark.HttpServer: Starting HTTP Server
16/05/18 00:22:58 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/05/18 00:22:58 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:44525
16/05/18 00:22:58 INFO util.Utils: Successfully started service 'HTTP file server' on port 44525.
16/05/18 00:22:58 INFO spark.SparkContext: Added JAR file:/tmp/dp-batch-sql.jar at http://10.41.66.63:44525/jars/dp-batch-sql.jar with timestamp 1463511178539
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 INFO client.RMProxy: Connecting to ResourceManager at batch-cluster-master/10.41.66.63:8032
16/05/18 00:22:58 INFO yarn.Client: Requesting a new application from cluster with 5 NodeManagers
16/05/18 00:22:58 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (20480 MB per container)
16/05/18 00:22:58 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/05/18 00:22:58 INFO yarn.Client: Setting up container launch context for our AM
16/05/18 00:22:58 INFO yarn.Client: Setting up the launch environment for our AM container
16/05/18 00:22:59 ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:264)
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:262)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUtil.scala:262)
at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:635)
at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:633)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:633)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:721)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
at com.orgid.dp.batch.sql.BatchDriver$.main(BatchDriver.scala:56)
at com.orgid.dp.batch.sql.BatchDriver.main(BatchDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
16/05/18 00:22:59 INFO ui.SparkUI: Stopped Spark web UI at http://10.41.66.63:4040
16/05/18 00:22:59 INFO cluster.YarnClientSchedulerBackend: Stopped
16/05/18 00:22:59 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/05/18 00:22:59 INFO storage.MemoryStore: MemoryStore cleared
16/05/18 00:22:59 INFO storage.BlockManager: BlockManager stopped
16/05/18 00:22:59 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
16/05/18 00:22:59 WARN metrics.MetricsSystem: Stopping a MetricsSystem that is not running
16/05/18 00:22:59 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/05/18 00:22:59 INFO spark.SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:264)
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:262)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUtil.scala:262)
at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:635)
at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:633)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:633)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:721)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
at com.orgid.dp.batch.sql.BatchDriver$.main(BatchDriver.scala:56)
at com.orgid.dp.batch.sql.BatchDriver.main(BatchDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/05/18 00:22:59 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/05/18 00:22:59 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/05/18 00:22:59 INFO util.ShutdownHookManager: Shutdown hook called
16/05/18 00:22:59 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-301d676b-38f6-4895-8a04-af37c5b7fa99
16/05/18 00:22:59 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-301d676b-38f6-4895-8a04-af37c5b7fa99/httpd-0f747205-f207-476e-8317-6083d8fe0b37
16/05/18 00:22:59 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
使用的json是:
{
"driver.config": {
"dp.batch.event.dir": "hdfs://hadoop.admin.com:9000/user/hadoop/parquet_data/output/",
"dp.batch.udf.scan.packages" : "com.orgid.dp.batch.udfs",
"dp.batch.enable.pathfinder": "true",
"dp.batch.input.timezone": "IST",
"dp.admin.local": "true",
"dp.admin.host": "",
"dp.admin.port": "",
"dp.batch.output.dir": "hdfs://hadoop.admin.com:9000/user/hadoop/aman/",
"dp.batch.output.timezone": "IST",
"dp.batch.output.date.dir.format": "yyyy/MM/dd/HH/mm",
"dp.batch.output.partition.count" : "4",
"email.enable" : "false",
"email.sender" : "feedsystemreports@abc.com, FeedSystem Reports",
"email.recipient" : "abcd@abc.com"
},
"dp.batch.read.data" :{
"last.hour" : "",
"last.day" : "",
"specific.date.startTime" :"01:05:16:00:00:00",
"specific.date.endTime" : "15:05:16:23:59:59"
},
"pathFinder.config" : {
"dp.storage.db.connection.url" : "jdbc:mysql://db.org.com:3306/dis",
"dp.storage.db.user.name" : "hadoop",
"dp.storage.db.password" : "hadoop"
},
"kafkaProducer.config" : {
"topic" : "dp_batch_api",
"bootstrap.servers" : "kafka.org.com:9920",
"replayJobEventTopic" : "dp_batch_replay"
},
"expressions": [
{
"id":30,
"expression":"SELECT count(*) from appHeartBeat",
"dependencies":["appHeartBeat"],
"alias":"",
"doExport":true
}
],
"externalDependencies":[
],
"spark.config" : {
"spark.sql.caseSensitive" : "true",
"spark.driver.memory" : "16G",
"spark.executor.memory" : "17G",
"spark.executor.cores" : "5",
"spark.executor.instances" : "25",
"spark.yarn.executor.memoryOverhead" : "2048",
"spark.app.name" : "dplite-batch-sql",
"spark.core.connection.ack.wait.timeout" : "600",
"spark.rdd.compress" : "false",
"spark.akka.timeout" : "600000",
"spark.storage.blockManagerHeartBeatMs" : "200000",
"spark.storage.blockManagerSlaveTimeoutMs" : "200000",
"spark.akka.retry.wait" : "120000",
"conf spark.akka.frameSize" : "1500",
"spark.driver.maxResultSize" : "1500",
"spark.worker.timeout" : "360000",
"spark.driver.extraJavaOptions" : "-XX:MaxPermSize=2048m -XX:PermSize=512m"
}
}
我无法弄清楚问题出在哪里。请帮忙。
提前谢谢
答案 0 :(得分:2)
看起来你受到了这个bug的影响: https://issues.apache.org/jira/browse/YARN-3768
将YARN升级到更高版本(2.8+),或找到没有价值的环境变量。