线程" main"中的例外情况org.apache.spark.SparkException:作业已中止:Spark群集向下看

时间:2016-11-18 07:43:23

标签: apache-spark

我正在使用Spark Standalone集群和一个worker来尝试使用SimpleApp.java。但是,每次更改后,我都会得到错误

Exception in thread "main" org.apache.spark.SparkException: Job aborted: Spark cluster looks down
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
    at akka.actor.ActorCell.invoke(ActorCell.scala:456)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
    at akka.dispatch.Mailbox.run(Mailbox.scala:219)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

我有以下设置

  • 在localhost上运行的独立主服务器
  • 并增加了工人 enter image description here

以下行来自主日志

Spark Command: /usr/lib/jvm/java-7-oracle/bin/java -cp /usr/local/spark/conf/:/usr/local/spark/jars/* -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --host 192.168.97.128 --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/11/18 12:36:57 INFO Master: Started daemon with process name: 6808@localhost
16/11/18 12:36:57 INFO SignalUtils: Registered signal handler for TERM
16/11/18 12:36:57 INFO SignalUtils: Registered signal handler for HUP
16/11/18 12:36:57 INFO SignalUtils: Registered signal handler for INT
16/11/18 12:36:57 WARN MasterArguments: SPARK_MASTER_IP is deprecated, please use SPARK_MASTER_HOST
16/11/18 12:36:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/11/18 12:36:58 INFO SecurityManager: Changing view acls to: vinay
16/11/18 12:36:58 INFO SecurityManager: Changing modify acls to: vinay
16/11/18 12:36:58 INFO SecurityManager: Changing view acls groups to: 
16/11/18 12:36:58 INFO SecurityManager: Changing modify acls groups to: 
16/11/18 12:36:58 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(vinay); groups with view permissions: Set(); users  with modify permissions: Set(vinay); groups with modify permissions: Set()
16/11/18 12:36:59 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
16/11/18 12:36:59 INFO Master: Starting Spark master at spark://192.168.97.128:7077
16/11/18 12:36:59 INFO Master: Running Spark version 2.0.1
16/11/18 12:36:59 INFO Utils: Successfully started service 'MasterUI' on port 8080.
16/11/18 12:36:59 INFO MasterWebUI: Bound MasterWebUI to 192.168.97.128, and started at http://192.168.97.128:8080
16/11/18 12:36:59 INFO Utils: Successfully started service on port 6066.
16/11/18 12:36:59 INFO StandaloneRestServer: Started REST server for submitting applications on port 6066
16/11/18 12:36:59 INFO Master: I have been elected leader! New state: ALIVE
16/11/18 12:38:58 INFO Master: 192.168.97.128:34770 got disassociated, removing it.

SimpleApp.java

public static void main(String[] args) {
      System.out.println("hellow world!!");
    String logFile = "/usr/local/spark/README.md"; // Should be some file on your system
    SparkConf conf = new SparkConf().setAppName("Simple Application");
    conf.setMaster("spark://192.168.97.128:7077");
   // conf.set(key, value)
    //conf.setMaster("local[4]");
    JavaSparkContext sc = new JavaSparkContext(conf);
    JavaRDD<String> logData = sc.textFile(logFile).cache();

    long numAs = logData.filter(new Function<String, Boolean>() {
      public Boolean call(String s) { return s.contains("a"); }
    }).count();

    long numBs = logData.filter(new Function<String, Boolean>() {
      public Boolean call(String s) { return s.contains("b"); }
    }).count();

    System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);

    sc.stop();
  }

与spark-env.sh

中修改的配置条目一起
SPARK_MASTER_HOST=192.168.97.128
SPARK_MASTER_IP=192.168.97.128
SPARK_LOCAL_IP=192.168.97.128  
SPARK_PUBLIC_DNS=192.168.97.128
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g

环境变量

SPARK_LOCAL_IP=192.168.97.128
SPARK_MASTER_IP=192.168.97.128

更新1: free -m output

vinay@localhost:/usr/local/spark/sbin$ free -m
              total        used        free      shared  buff/cache   available
Mem:           7875        4500         970         531        2404        2756
Swap:          8082           6        8076

更新2: 程序输出

16/11/18 15:33:05 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/11/18 15:33:05 INFO Remoting: Starting remoting
16/11/18 15:33:05 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@192.168.97.128:43526]
16/11/18 15:33:05 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@192.168.97.128:43526]
16/11/18 15:33:05 INFO spark.SparkEnv: Registering BlockManagerMaster
16/11/18 15:33:06 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20161118153305-9cf5
16/11/18 15:33:06 INFO storage.MemoryStore: MemoryStore started with capacity 1050.6 MB.
16/11/18 15:33:06 INFO network.ConnectionManager: Bound socket to port 46557 with id = ConnectionManagerId(192.168.97.128,46557)
16/11/18 15:33:06 INFO storage.BlockManagerMaster: Trying to register BlockManager
16/11/18 15:33:06 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager 192.168.97.128:46557 with 1050.6 MB RAM
16/11/18 15:33:06 INFO storage.BlockManagerMaster: Registered BlockManager
16/11/18 15:33:06 INFO spark.HttpServer: Starting HTTP Server
16/11/18 15:33:06 INFO server.Server: jetty-7.6.8.v20121106
16/11/18 15:33:06 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:33688
16/11/18 15:33:06 INFO broadcast.HttpBroadcast: Broadcast server started at http://192.168.97.128:33688
16/11/18 15:33:06 INFO spark.SparkEnv: Registering MapOutputTracker
16/11/18 15:33:06 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-633ba798-963f-4b02-ab23-1edb4e677fde
16/11/18 15:33:06 INFO spark.HttpServer: Starting HTTP Server
16/11/18 15:33:06 INFO server.Server: jetty-7.6.8.v20121106
16/11/18 15:33:06 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:46433
16/11/18 15:33:06 INFO server.Server: jetty-7.6.8.v20121106
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage/rdd,null}
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage,null}
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/stage,null}
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/pool,null}
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages,null}
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/environment,null}
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/executors,null}
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/metrics/json,null}
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/static,null}
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/,null}
16/11/18 15:33:06 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/11/18 15:33:06 INFO ui.SparkUI: Started Spark Web UI at http://192.168.97.128:4040
16/11/18 15:33:06 INFO client.AppClient$ClientActor: Connecting to master spark://192.168.97.128:7077...
16/11/18 15:33:07 INFO storage.MemoryStore: ensureFreeSpace(32856) called with curMem=0, maxMem=1101633945
16/11/18 15:33:07 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 32.1 KB, free 1050.6 MB)
16/11/18 15:33:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/11/18 15:33:08 WARN snappy.LoadSnappy: Snappy native library not loaded
16/11/18 15:33:08 INFO mapred.FileInputFormat: Total input paths to process : 1
16/11/18 15:33:08 INFO spark.SparkContext: Starting job: count at SimpleApp.java:20
16/11/18 15:33:08 INFO scheduler.DAGScheduler: Got job 0 (count at SimpleApp.java:20) with 2 output partitions (allowLocal=false)
16/11/18 15:33:08 INFO scheduler.DAGScheduler: Final stage: Stage 0 (count at SimpleApp.java:20)
16/11/18 15:33:08 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/11/18 15:33:08 INFO scheduler.DAGScheduler: Missing parents: List()
16/11/18 15:33:08 INFO scheduler.DAGScheduler: Submitting Stage 0 (FilteredRDD[2] at filter at SimpleApp.java:18), which has no missing parents
16/11/18 15:33:08 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (FilteredRDD[2] at filter at SimpleApp.java:18)
16/11/18 15:33:08 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/11/18 15:33:23 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
16/11/18 15:33:26 INFO client.AppClient$ClientActor: Connecting to master spark://192.168.97.128:7077...
16/11/18 15:33:38 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
16/11/18 15:33:46 INFO client.AppClient$ClientActor: Connecting to master spark://192.168.97.128:7077...
16/11/18 15:33:53 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
16/11/18 15:34:06 ERROR client.AppClient$ClientActor: All masters are unresponsive! Giving up.
16/11/18 15:34:06 ERROR cluster.SparkDeploySchedulerBackend: Spark cluster looks dead, giving up.
16/11/18 15:34:06 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
16/11/18 15:34:06 INFO scheduler.DAGScheduler: Failed to run count at SimpleApp.java:20
Exception in thread "main" org.apache.spark.SparkException: Job aborted: Spark cluster looks down
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
    at akka.actor.ActorCell.invoke(ActorCell.scala:456)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
    at akka.dispatch.Mailbox.run(Mailbox.scala:219)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

1 个答案:

答案 0 :(得分:2)

可用内存为970mb,但是你配置了2GB。尝试将SPARK_WORKER_MEMORY的值设为500mb,然后重试

希望这有帮助