火花1.3工人接受工作,但控制台说资源不可用

时间:2015-03-26 21:09:56

标签: apache-spark

我试图在亚马逊EMR上运行apache spark 1.3,亚马逊的hadoop 2.4与2名工人一起独立运行。但是当我这样做时,我得到以下信息:

  

[TaskSchedulerImpl] - 初始作业未接受任何资源;检查您的集群UI以确保工作人员已注册并具有足够的资源

我正在设置以下参数:

conf = new SparkConf();
conf.setAppName("SVM Classifier Example");
conf.set("spark.executor.memory", "1024m");
conf.set("spark.cores.max", "1");

但是当我在我的本地运行时(使用apache hadoop 2.4和spark 1.3)我可以在几秒钟内执行它。 我检查过每台工作机器在两种情况下都有大约1.6G的可用内存,所以这不是问题。

以下是工作人员说的日志:

15/03/26 20:54:27 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
15/03/26 20:54:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/03/26 20:54:29 INFO spark.SecurityManager: Changing view acls to: root
15/03/26 20:54:29 INFO spark.SecurityManager: Changing modify acls to: root
15/03/26 20:54:29 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/03/26 20:54:30 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/03/26 20:54:31 INFO Remoting: Starting remoting
15/03/26 20:54:31 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@ip-XXXX.ec2.internal:50899]
15/03/26 20:54:31 INFO util.Utils: Successfully started service 'driverPropsFetcher' on port 50899.
15/03/26 20:54:32 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkDriver@ip-XXXX.ec2.internal:49161] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://sparkDriver@ip-XXXX.ec2.internal:49161]].

我无法弄清楚出了什么问题。任何意见和建议都表示赞赏。

Screenshot of master UI


编辑:我无法上传我的控制台的屏幕截图。 但这里有详细信息:

> Worker Id      Cores        Memory 
>  1             8 (8 Used)   1172.0 MB (1024.0 MB Used) 
>  2             8 (8 Used)   1536.0 MB (1024.0 MB Used)
>Running Applications
> ID       Cores      Memory per Node  User  State    Duration
> 1          16         1024.0 MB      root  Running   1.5h

1 个答案:

答案 0 :(得分:0)

所以问题就是我系统中的防火墙问题。防火墙策略是这样的,工人可以与主人沟通,但不能与司机沟通。打开prots进行双向通信,这解决了我的问题。