无法在Apache Storm群集

时间:2015-05-04 04:16:01

标签: apache-storm supervisor

我正在关注http://jayatiatblogs.blogspot.com/2011/11/storm-installation.html尝试在Amazon Web Services上使用少量虚拟机(EC2)和Ubuntu 14.04 LTS配置Apache Storm远程群集。

我的主节点是10.0.0.230,我的从节点是10.0.0.79。我的zookeeper驻留在我的主节点中。当我在主节点中使用 storm jar storm-starter-0.9.4-jar-with-dependencies.jar storm.starter.RollingTopWords production-topology remote 时,会出现以下消息出现,表示已成功提交:

339  [main] INFO  storm.starter.RollingTopWords - Topology name: production-topology
377  [main] INFO  storm.starter.RollingTopWords - Running in remote (cluster) mode
651  [main] INFO  backtype.storm.StormSubmitter - Jar not uploaded to master yet. Submitting jar...
655  [main] INFO  backtype.storm.StormSubmitter - Uploading topology jar storm-starter-0.9.4-jar-with-dependencies.jar to assigned location: /home/ubuntu/storm/data/nimbus/inbox/stormjar-380bb1a2-1699-4ad1-8341-3d4b92c14764.jar
672  [main] INFO  backtype.storm.StormSubmitter - Successfully uploaded topology jar to assigned location: /home/ubuntu/storm/data/nimbus/inbox/stormjar-380bb1a2-1699-4ad1-8341-3d4b92c14764.jar
672  [main] INFO  backtype.storm.StormSubmitter - Submitting topology production-topology in distributed mode with conf {"topology.debug":true}
714  [main] INFO  backtype.storm.StormSubmitter - Finished submitting topology: production-topology

Stoum UI& 风暴列表 命令显示拓扑处于活动状态:

Topology_name        Status     Num_tasks  Num_workers  Uptime_secs
-------------------------------------------------------------------
production-topology  ACTIVE     0          0            59

但是,在Storm UI的 Cluster Summary 中,有0个supervisor,0个使用的槽,0个空闲槽,0个执行器和& 0个任务。在拓扑配置中, supervisor.slots.ports 表明它使用主节点的默认超级用户插槽端口,而不是从属节点的超级用户插槽端口。

以下是我的主节点的 zoo.cfg

tickTime=2000
dataDir=/home/ubuntu/zookeeper-data
clientPort=2181

我的主节点的 storm.yaml

 storm.zookeeper.servers:
     - "10.0.0.230"
 storm.zookeeper.port: 2181

 nimbus.host: "localhost"
 nimbus.thrift.port: 6627
 nimbus.task.launch.secs: 240

 supervisor.worker.start.timeout.secs: 240
 supervisor.worker.timeout.secs: 240

 storm.local.dir: "/home/ubuntu/storm/data"   
 java.library.path: "/usr/lib/jvm/java-7-oracle"

我的从属节点的 storm.yaml

 storm.zookeeper.server:
     - "10.0.0.230"
 storm.zookeeper.port: 2181
 nimbus.host: "10.0.0.230"
 nimbus.thrift.port: 6627

 storm.local.dir: "/home/ubuntu/storm/data"
 java.library.path: "/usr/lib/jvm/java-7-oracle"

 supervisor.slots.ports:
     - 6700
     - 6701
     - 6702
     - 6703
     - 6704

我使用 zkCli.sh -server 10.0.0.230:2181 连接到主节点上的zookeeper,它运行正常:

2015-05-04 03:40:20,866 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=10.0.0.230:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@63f78dde
2015-05-04 03:40:20,888 [myid:] - INFO  [main-SendThread(10.0.0.230:2181):ClientCnxn$SendThread@975] - Opening socket connection to server 10.0.0.230/10.0.0.230:2181. Will not attempt to authenticate using SASL (unknown error)
Welcome to ZooKeeper!
2015-05-04 03:40:20,900 [myid:] - INFO  [main-SendThread(10.0.0.230:2181):ClientCnxn$SendThread@852] - Socket connection established to 10.0.0.230/10.0.0.230:2181, initiating session
JLine support is enabled
2015-05-04 03:40:20,918 [myid:] - INFO  [main-SendThread(10.0.0.230:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server 10.0.0.230/10.0.0.230:2181, sessionid = 0x14d1ca1ab73001c, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: 10.0.0.230:2181(CONNECTED) 0]

以下是来自我的从属节点的超级用户日志:

2015-05-06T06:16:28.487+0000 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-06T06:16:28.487+0000 o.a.s.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.7.0_80]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) ~[na:1.7.0_80]
        at org.apache.storm.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[storm-core-0.9.4.jar:0.9.4]
        at org.apache.storm.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) ~[storm-core-0.9.4.jar:0.9.4]
2015-05-06T06:16:28.589+0000 b.s.d.supervisor [ERROR] Error on initialization of server mk-supervisor
java.lang.RuntimeException: org.apache.storm.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /storm
        at backtype.storm.util$wrap_in_runtime.invoke(util.clj:44) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.zookeeper$exists_node_QMARK_$fn__807.invoke(zookeeper.clj:102) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:98) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.zookeeper$mkdirs.invoke(zookeeper.clj:114) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.cluster$mk_distributed_cluster_state.invoke(cluster.clj:43) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.cluster$mk_storm_cluster_state.invoke(cluster.clj:238) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.daemon.supervisor$supervisor_data.invoke(supervisor.clj:214) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.daemon.supervisor$fn__5518$exec_fn__1754__auto____5519.invoke(supervisor.clj:409) ~[storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.AFn.applyToHelper(AFn.java:167) [clojure-1.5.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
        at clojure.core$apply.invoke(core.clj:617) ~[clojure-1.5.1.jar:na]
        at backtype.storm.daemon.supervisor$fn__5518$mk_supervisor__5544.doInvoke(supervisor.clj:405) [storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.RestFn.invoke(RestFn.java:436) [clojure-1.5.1.jar:na]
        at backtype.storm.daemon.supervisor$_launch.invoke(supervisor.clj:629) [storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.daemon.supervisor$_main.invoke(supervisor.clj:659) [storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.AFn.applyToHelper(AFn.java:159) [clojure-1.5.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
        at backtype.storm.daemon.supervisor.main(Unknown Source) [storm-core-0.9.4.jar:0.9.4]
Caused by: org.apache.storm.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /storm
        at org.apache.storm.zookeeper.KeeperException.create(KeeperException.java:99) ~[storm-core-0.9.4.jar:0.9.4]
        at org.apache.storm.zookeeper.KeeperException.create(KeeperException.java:51) ~[storm-core-0.9.4.jar:0.9.4]
        at org.apache.storm.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) ~[storm-core-0.9.4.jar:0.9.4]
        at org.apache.storm.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172) ~[storm-core-0.9.4.jar:0.9.4]
        at org.apache.storm.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161) ~[storm-core-0.9.4.jar:0.9.4]
        at org.apache.storm.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[storm-core-0.9.4.jar:0.9.4]
        at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:157) ~[storm-core-0.9.4.jar:0.9.4]
        at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148) ~[storm-core-0.9.4.jar:0.9.4]
        at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.zookeeper$exists_node_QMARK_$fn__807.invoke(zookeeper.clj:101) ~[storm-core-0.9.4.jar:0.9.4]
        ... 16 common frames omitted
2015-05-06T06:16:28.607+0000 b.s.util [ERROR] Halting process: ("Error on initialization")
java.lang.RuntimeException: ("Error on initialization")
        at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
        at backtype.storm.daemon.supervisor$fn__5518$mk_supervisor__5544.doInvoke(supervisor.clj:405) [storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.RestFn.invoke(RestFn.java:436) [clojure-1.5.1.jar:na]
        at backtype.storm.daemon.supervisor$_launch.invoke(supervisor.clj:629) [storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.daemon.supervisor$_main.invoke(supervisor.clj:659) [storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.AFn.applyToHelper(AFn.java:159) [clojure-1.5.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
        at backtype.storm.daemon.supervisor.main(Unknown Source) [storm-core-0.9.4.jar:0.9.4]

以下是我主节点的nimbus日志:

2015-05-06T06:14:19.291+0000 b.s.d.nimbus [INFO] Using default scheduler
2015-05-06T06:14:19.304+0000 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries [5]
2015-05-06T06:14:19.415+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO] Starting
2015-05-06T06:14:19.417+0000 o.a.s.z.ZooKeeper [INFO] Initiating client connection, connectString=10.0.0.230:2181 sessionTimeout=20000 watcher=org.apache.storm.curator.ConnectionState@795bca46
2015-05-06T06:14:19.436+0000 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 10.0.0.230/10.0.0.230:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-06T06:14:19.448+0000 o.a.s.z.ClientCnxn [INFO] Socket connection established to 10.0.0.230/10.0.0.230:2181, initiating session
2015-05-06T06:14:19.457+0000 o.a.s.z.ClientCnxn [INFO] Session establishment complete on server 10.0.0.230/10.0.0.230:2181, sessionid = 0x14d27dbda310000, negotiated timeout = 20000
2015-05-06T06:14:19.459+0000 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
2015-05-06T06:14:19.460+0000 b.s.zookeeper [INFO] Zookeeper state update: :connected:none
2015-05-06T06:14:20.485+0000 o.a.s.z.ClientCnxn [INFO] EventThread shut down
2015-05-06T06:14:20.485+0000 o.a.s.z.ZooKeeper [INFO] Session: 0x14d27dbda310000 closed
2015-05-06T06:14:20.486+0000 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries [5]
2015-05-06T06:14:20.487+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO] Starting
2015-05-06T06:14:20.487+0000 o.a.s.z.ZooKeeper [INFO] Initiating client connection, connectString=10.0.0.230:2181/storm sessionTimeout=20000 watcher=org.apache.storm.curator.ConnectionState@510d246b
2015-05-06T06:14:20.504+0000 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 10.0.0.230/10.0.0.230:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-06T06:14:20.505+0000 o.a.s.z.ClientCnxn [INFO] Socket connection established to 10.0.0.230/10.0.0.230:2181, initiating session
2015-05-06T06:14:20.507+0000 o.a.s.z.ClientCnxn [INFO] Session establishment complete on server 10.0.0.230/10.0.0.230:2181, sessionid = 0x14d27dbda310001, negotiated timeout = 20000
2015-05-06T06:14:20.507+0000 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
2015-05-06T06:14:20.547+0000 b.s.d.nimbus [INFO] Starting Nimbus server...

我曾使用 风暴雨云 &我的主节点中的 风暴ui ,我的从属节点中的 风暴主管

从我的slave节点的supervisor.logs中,它显示我的slave节点倾向于连接到本地主机上的zookeeper,虽然我已经在我的slave节点的storm.yaml中指定我的zookeeper在我的主节点中。 为什么会发生这种情况以及如何解决这个问题?

那么,为什么在Storm UI的 Cluster Summary 中,有0个主管,0个使用的槽,0个空闲槽,0个执行器和& 0个任务? 为什么它使用主节点的主管插槽端口而不是从节点?

当我单击Storm UI的拓扑摘要中的生产拓扑时,有0个Num工作者,0个Num执行程序,0个Num任务? 为什么没有Spouts&的信息显示螺栓?

1 个答案:

答案 0 :(得分:0)

我发现了这个问题。我应该将我的zookeeper设置在我的slave节点上,而不是我的主节点上。现在问题解决了风暴群落了。