我正在尝试使用SPARK内部REST API提交Spark程序。 请求提交以下程序。所需的支持罐已到位。
curl -X POST http://quickstart.cloudera:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
"action" : "CreateSubmissionRequest",
"appArgs" : [ "SampleSparkProgramApp" ],
"appResource" : "file:///home/cloudera/test_sample_example/spark-example.jar",
"clientSparkVersion" : "1.5.0",
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
"mainClass" : "com.example.SampleSparkProgram",
"sparkProperties" : {
"spark.jars" : "file:///home/cloudera/test_sample_example/lib/mongo-hadoop-core-1.0-snapshot.jar,file:///home/cloudera/test_sample_example/lib/mongo-java-driver-3.0.4.jar,file:///home/cloudera/test_sample_example/lib/lucene-analyzers-common-5.4.0.jar,file:///home/cloudera/test_sample_example/lib/lucene-core-5.2.1.jar",
"spark.driver.supervise" : "false",
"spark.app.name" : "MyJob",
"spark.eventLog.enabled": "true",
"spark.submit.deployMode" : "client",
"spark.master" : "spark://quickstart.cloudera:6066"
}
}'
com.mongodb.hadoop.MongoInputFormat
中的mongo-hadoop-core-1.0-snapshot.jar
类是可用的,并且使用键" spark.jars"将jar添加到请求中。
我在spark UI日志中遇到错误。
1.5.0-cdh5.5.0 stderr log page for driver-20160121040910-0026
Back to Master
Bytes 0 - 12640 of 12640
Launch Command: "/usr/java/jdk1.7.0_67-cloudera/jre/bin/java" "-cp" "/usr/lib/spark/sbin/../conf/:/usr/lib/spark/lib/spark-assembly-1.5.0-cdh5.5.0-hadoop2.6.0-cdh5.5.0.jar:/etc/hadoop/conf/:/usr/lib/spark/sbin/../lib/spark-assembly.jar:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hive/lib/*:/usr/lib/flume-ng/lib/*:/usr/lib/paquet/lib/*:/usr/lib/avro/lib/*" "-Xms1024M" "-Xmx1024M" "-Dspark.eventLog.enabled=true" "-Dspark.driver.supervise=false" "-Dspark.app.name=MyJob" "-Dspark.jars=file:///home/cloudera/test_sample_example/lib/mongo-hadoop-core-1.0-snapshot.jar,file:///home/cloudera/test_sample_example/lib/mongo-java-driver-3.0.4.jar,file:///home/cloudera/test_sample_example/lib/lucene-analyzers-common-5.4.0.jar,file:///home/cloudera/test_sample_example/lib/lucene-core-5.2.1.jar" "-Dspark.master=spark://quickstart.cloudera:7077" "-Dspark.submit.deployMode=client" "-XX:MaxPermSize=256m" "org.apache.spark.deploy.worker.DriverWrapper" "akka.tcp://sparkWorker@182.162.106.131:7078/user/Worker" "/var/run/spark/work/driver-20160121040910-0026/spark-example.jar" "com.example.SampleSparkProgram" "SampleSparkProgramApp"
========================================
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/01/21 04:09:16 WARN util.Utils: Your hostname, quickstart.cloudera resolves to a loopback address: 127.0.0.1; using 182.162.106.131 instead (on interface eth1)
16/01/21 04:09:16 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/01/21 04:09:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/21 04:09:21 INFO spark.SecurityManager: Changing view acls to: root
16/01/21 04:09:21 INFO spark.SecurityManager: Changing modify acls to: root
16/01/21 04:09:21 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/01/21 04:09:26 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/01/21 04:09:26 INFO Remoting: Starting remoting
16/01/21 04:09:27 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://Driver@182.162.106.131:38181]
16/01/21 04:09:27 INFO Remoting: Remoting now listens on addresses: [akka.tcp://Driver@182.162.106.131:38181]
16/01/21 04:09:27 INFO util.Utils: Successfully started service 'Driver' on port 38181.
16/01/21 04:09:27 INFO worker.WorkerWatcher: Connecting to worker akka.tcp://sparkWorker@182.162.106.131:7078/user/Worker
16/01/21 04:09:28 INFO spark.SparkContext: Running Spark version 1.5.0-cdh5.5.0
16/01/21 04:09:28 INFO worker.WorkerWatcher: Successfully connected to akka.tcp://sparkWorker@182.162.106.131:7078/user/Worker
16/01/21 04:09:28 INFO spark.SecurityManager: Changing view acls to: root
16/01/21 04:09:28 INFO spark.SecurityManager: Changing modify acls to: root
16/01/21 04:09:28 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/01/21 04:09:29 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/01/21 04:09:29 INFO Remoting: Starting remoting
16/01/21 04:09:29 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@182.162.106.131:35467]
16/01/21 04:09:29 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@182.162.106.131:35467]
16/01/21 04:09:29 INFO util.Utils: Successfully started service 'sparkDriver' on port 35467.
16/01/21 04:09:29 INFO spark.SparkEnv: Registering MapOutputTracker
16/01/21 04:09:30 INFO spark.SparkEnv: Registering BlockManagerMaster
16/01/21 04:09:30 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-6b24e210-4002-4e28-ac60-c2ecc497b914
16/01/21 04:09:30 INFO storage.MemoryStore: MemoryStore started with capacity 534.5 MB
16/01/21 04:09:31 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-bc7bc70e-af91-44cb-a764-8c6d1d9b3acc/httpd-65b7bbf1-af6d-4252-8629-95fcb60f706f
16/01/21 04:09:31 INFO spark.HttpServer: Starting HTTP Server
16/01/21 04:09:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/01/21 04:09:31 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:38126
16/01/21 04:09:31 INFO util.Utils: Successfully started service 'HTTP file server' on port 38126.
16/01/21 04:09:31 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/01/21 04:09:33 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/01/21 04:09:33 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/01/21 04:09:33 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/01/21 04:09:33 INFO ui.SparkUI: Started SparkUI at http://182.162.106.131:4040
16/01/21 04:09:33 INFO spark.SparkContext: Added JAR hdfs:///user/cloudera/sample_example/lib/mongo-hadoop-core-1.0-snapshot.jar at hdfs:///user/cloudera/sample_example/lib/mongo-hadoop-core-1.0-snapshot.jar with timestamp 1453378173778
16/01/21 04:09:33 INFO spark.SparkContext: Added JAR hdfs:///user/cloudera/sample_example/lib/mongo-java-driver-3.0.4.jar at hdfs:///user/cloudera/sample_example/lib/mongo-java-driver-3.0.4.jar with timestamp 1453378173782
16/01/21 04:09:33 INFO spark.SparkContext: Added JAR hdfs:///user/cloudera/sample_example/lib/lucene-analyzers-common-5.4.0.jar at hdfs:///user/cloudera/sample_example/lib/lucene-analyzers-common-5.4.0.jar with timestamp 1453378173783
16/01/21 04:09:33 INFO spark.SparkContext: Added JAR hdfs:///user/cloudera/sample_example/lib/lucene-core-5.2.1.jar at hdfs:///user/cloudera/sample_example/lib/lucene-core-5.2.1.jar with timestamp 1453378173783
16/01/21 04:09:34 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
16/01/21 04:09:34 INFO client.AppClient$ClientEndpoint: Connecting to master spark://quickstart.cloudera:7077...
16/01/21 04:09:35 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160121040935-0025
16/01/21 04:09:36 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38749.
16/01/21 04:09:36 INFO netty.NettyBlockTransferService: Server created on 38749
16/01/21 04:09:36 INFO storage.BlockManagerMaster: Trying to register BlockManager
16/01/21 04:09:36 INFO storage.BlockManagerMasterEndpoint: Registering block manager 182.162.106.131:38749 with 534.5 MB RAM, BlockManagerId(driver, 182.162.106.131, 38749)
16/01/21 04:09:36 INFO storage.BlockManagerMaster: Registered BlockManager
16/01/21 04:09:40 INFO scheduler.EventLoggingListener: Logging events to file:/tmp/spark-events/app-20160121040935-0025
16/01/21 04:09:40 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/01/21 04:09:40 INFO analyser.NaiveByesAnalyserFactory: ENTERING
16/01/21 04:09:40 INFO dao.MongoDataExtractor: ENTERING
16/01/21 04:09:41 INFO dao.MongoDataExtractor: EXITING
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.NoClassDefFoundError: com/mongodb/hadoop/MongoInputFormat
at com.examples.dao.MongoDataExtractor.getData(MongoDataExtractor.java:35)
at com.examples.analyser.NaiveByesAnalyserFactory.getNaiveByesAnalyserFactory(NaiveByesAnalyserFactory.java:27)
at com.example.SampleSparkProgram.main(SampleSparkProgram.java:24)
... 6 more
Caused by: java.lang.ClassNotFoundException: com.mongodb.hadoop.MongoInputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 9 more
16/01/21 04:09:41 INFO spark.SparkContext: Invoking stop() from shutdown hook
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
16/01/21 04:09:41 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
16/01/21 04:09:41 INFO ui.SparkUI: Stopped Spark web UI at http://182.162.106.131:4040
16/01/21 04:09:41 INFO scheduler.DAGScheduler: Stopping DAGScheduler
16/01/21 04:09:41 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors
16/01/21 04:09:41 INFO cluster.SparkDeploySchedulerBackend: Asking each executor to shut down
16/01/21 04:09:41 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/01/21 04:09:41 INFO storage.MemoryStore: MemoryStore cleared
16/01/21 04:09:41 INFO storage.BlockManager: BlockManager stopped
16/01/21 04:09:41 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
16/01/21 04:09:41 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/01/21 04:09:41 INFO spark.SparkContext: Successfully stopped SparkContext
16/01/21 04:09:41 INFO util.ShutdownHookManager: Shutdown hook called
16/01/21 04:09:41 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-bc7bc70e-af91-44cb-a764-8c6d1d9b3acc