使用Storm 1.2.2
Kafka 1.1.0
提交拓扑后,主管启动工作进程。在检查启动的工作进程的worker.log文件时,发现在加载所有执行程序之间的某个位置,工作进程被主管杀死。
以下是主管日志,
{"@timestamp":"2020-01-09 11:18:57,719","message":"SLOT 6700: Assignment Changed from LocalAssignment(topology_id:trident-Topology-578320979, executors:[ExecutorInfo(task_start:22, task_end:22), ExecutorInfo(task_start:2, task_end:2), ExecutorInfo(task_start:42, task_end:42), ExecutorInfo(task_start:18, task_end:18), ExecutorInfo(task_start:10, task_end:10), ExecutorInfo(task_start:14, task_end:14), ExecutorInfo(task_start:6, task_end:6), ExecutorInfo(task_start:38, task_end:38), ExecutorInfo(task_start:30, task_end:30), ExecutorInfo(task_start:34, task_end:34), ExecutorInfo(task_start:50, task_end:50), ExecutorInfo(task_start:46, task_end:46), ExecutorInfo(task_start:26, task_end:26), ExecutorInfo(task_start:39, task_end:39), ExecutorInfo(task_start:47, task_end:47), ExecutorInfo(task_start:7, task_end:7), ExecutorInfo(task_start:51, task_end:51), ExecutorInfo(task_start:3, task_end:3), ExecutorInfo(task_start:35, task_end:35), ExecutorInfo(task_start:31, task_end:31), ExecutorInfo(task_start:27, task_end:27), ExecutorInfo(task_start:43, task_end:43), ExecutorInfo(task_start:23, task_end:23), ExecutorInfo(task_start:11, task_end:11), ExecutorInfo(task_start:19, task_end:19), ExecutorInfo(task_start:15, task_end:15), ExecutorInfo(task_start:24, task_end:24), ExecutorInfo(task_start:12, task_end:12), ExecutorInfo(task_start:8, task_end:8), ExecutorInfo(task_start:4, task_end:4), ExecutorInfo(task_start:32, task_end:32), ExecutorInfo(task_start:40, task_end:40), ExecutorInfo(task_start:36, task_end:36), ExecutorInfo(task_start:28, task_end:28), ExecutorInfo(task_start:20, task_end:20), ExecutorInfo(task_start:16, task_end:16), ExecutorInfo(task_start:48, task_end:48), ExecutorInfo(task_start:44, task_end:44), ExecutorInfo(task_start:21, task_end:21), ExecutorInfo(task_start:33, task_end:33), ExecutorInfo(task_start:41, task_end:41), ExecutorInfo(task_start:37, task_end:37), ExecutorInfo(task_start:1, task_end:1), ExecutorInfo(task_start:9, task_end:9), ExecutorInfo(task_start:13, task_end:13), ExecutorInfo(task_start:17, task_end:17), ExecutorInfo(task_start:5, task_end:5), ExecutorInfo(task_start:29, task_end:29), ExecutorInfo(task_start:25, task_end:25), ExecutorInfo(task_start:45, task_end:45), ExecutorInfo(task_start:49, task_end:49)], resources:WorkerResources(mem_on_heap:0.0, mem_off_heap:0.0, cpu:0.0), owner:root) to null","thread_name":"SLOT_6700","level":"WARN"}
{"@timestamp":"2020-01-09 11:18:57,724","message":"Killing 29a1f333-55f1-45c2-988d-daf0712c2862:5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"@timestamp":"2020-01-09 11:19:00,808","message":"STATE RUNNING msInState: 120187 topo:trident-Topology-578320979 worker:5e19382e-c3e5-4c8d-8706-185e00e658a8 -> KILL msInState: 0 topo:trident-Topology-578320979 worker:5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"@timestamp":"2020-01-09 11:19:00,809","message":"GET worker-user for 5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"@timestamp":"2020-01-09 11:19:00,828","message":"SLOT 6700 force kill and wait...","thread_name":"SLOT_6700","level":"WARN"}
{"@timestamp":"2020-01-09 11:19:00,831","message":"Force Killing 29a1f333-55f1-45c2-988d-daf0712c2862:5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"@timestamp":"2020-01-09 11:19:01,432","message":"Worker Process 5e19382e-c3e5-4c8d-8706-185e00e658a8 exited with code: 137","thread_name":"Thread-30","level":"INFO"}
{"@timestamp":"2020-01-09 11:19:03,851","message":"GET worker-user for 5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"@timestamp":"2020-01-09 11:19:03,858","message":"SLOT 6700 all processes are dead...","thread_name":"SLOT_6700","level":"WARN"}
{"@timestamp":"2020-01-09 11:19:03,859","message":"Cleaning up 29a1f333-55f1-45c2-988d-daf0712c2862:5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"@timestamp":"2020-01-09 11:19:03,859","message":"GET worker-user for 5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"@timestamp":"2020-01-09 11:19:03,859","message":"Deleting path /data/workers/5e19382e-c3e5-4c8d-8706-185e00e658a8/pids/3100","thread_name":"SLOT_6700","level":"INFO"}
{"@timestamp":"2020-01-09 11:19:03,860","message":"Deleting path /data/workers/5e19382e-c3e5-4c8d-8706-185e00e658a8/heartbeats","thread_name":"SLOT_6700","level":"INFO"}
{"@timestamp":"2020-01-09 11:19:03,871","message":"Deleting path /data/workers/5e19382e-c3e5-4c8d-8706-185e00e658a8/pids","thread_name":"SLOT_6700","level":"INFO"}
{"@timestamp":"2020-01-09 11:19:03,872","message":"Deleting path /data/workers/5e19382e-c3e5-4c8d-8706-185e00e658a8/tmp","thread_name":"SLOT_6700","level":"INFO"}
{"@timestamp":"2020-01-09 11:19:03,872","message":"Deleting path /data/workers/5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"@timestamp":"2020-01-09 11:19:03,873","message":"REMOVE worker-user 5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"@timestamp":"2020-01-09 11:19:03,874","message":"Deleting path /data/workers-users/5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"@timestamp":"2020-01-09 11:19:03,876","message":"Removed Worker ID 5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"@timestamp":"2020-01-09 11:19:03,876","message":"STATE KILL msInState: 3068 topo:trident-Topology-578320979 worker:null -> EMPTY msInState: 0","thread_name":"SLOT_6700","level":"INFO"}
此ID为5e19382e-c3e5-4c8d-8706-185e00e658a8
的工作程序被杀死后,主管使用不同的ID启动新的工作程序进程,并再次开始执行程序的加载,然后在某些执行程序完成加载后,该工作程序进程将收到一个杀死主管的信号。
以下是在6700端口的工作日志,
...
2020-01-09 14:42:19.455 o.a.s.d.executor main [INFO] Loading executor b-14:[10 10]
2020-01-09 14:42:20.942 o.a.s.d.executor main [INFO] Loaded executor tasks b-14:[10 10]
2020-01-09 14:42:20.945 o.a.s.d.executor main [INFO] Finished loading executor b-14[10 10]
2020-01-09 14:42:20.962 o.a.s.d.executor main [INFO] Loading executor b-39:[37 37]
2020-01-09 14:42:22.547 o.a.s.d.executor main [INFO] Loaded executor tasks b-39:[37 37]
2020-01-09 14:42:22.549 o.a.s.d.executor main [INFO] Finished loading executor b-39:[37 37]
2020-01-09 14:42:22.566 o.a.s.d.executor main [INFO] Loading executor b-5:[46 46]
2020-01-09 14:42:25.267 o.a.s.d.executor main [INFO] Loaded executor tasks b-5:[46 46]
2020-01-09 14:42:25.269 o.a.s.d.executor main [INFO] Finished loading executor b-5:[46 46]
2020-01-09 14:42:31.175 o.a.s.d.executor main [INFO] Loading executor b-0:[4 4]
2020-01-09 14:42:37.512 o.s.c.n.e.InstanceInfoFactory Thread-10 [INFO] Setting initial instance status as: STARTING
2020-01-09 14:42:37.637 o.s.s.c.ThreadPoolTaskScheduler [Ljava.lang.String;@174cb0d8.container-0-C-1 [INFO] Shutting down ExecutorService
2020-01-09 14:42:37.851 o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer [Ljava.lang.String;@174cb0d8.container-0-C-1 [INFO] Consumer stopped
2020-01-09 14:42:37.855 o.s.i.k.i.KafkaMessageDrivenChannelAdapter Thread-10 [INFO] stopped org.springframework.integration.kafka.inbound.KafkaMessageDrivenChannelAdapter@2459333a
2020-01-09 14:42:37.870 o.s.s.c.ThreadPoolTaskScheduler [Ljava.lang.String;@6e355249.container-0-C-1 [INFO] Shutting down ExecutorService
2020-01-09 14:42:38.054 o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer [Ljava.lang.String;@6e355249.container-0-C-1 [INFO] Consumer stopped
此后,它将再次从“为trident-Topology-578320979启动工人...”并加载所有执行程序和任务。
有人可以解释"Worker Process 5e19382e-c3e5-4c8d-8706-185e00e658a8 exited with code: 137"
是什么意思吗?
下面的链接[https://issues.apache.org/jira/browse/STORM-2176]解释了配置属性supervisor.worker.shutdown.sleep.secs
(默认设置为1秒)。这对应于主管在以kill -9强行杀死工人之前将等待工人正常离开的时间。发生这种情况时,主管将记录该工人以退出代码137(128 + 9)终止。
这是否有助于增加supervisor.worker.shutdown.sleep.secs
的价值?
还是因为JVM没有足够的内存?但是然后,它应该在线程“ main” java.lang.OutOfMemoryError:Java堆空间中抛出 Exception ,而在任何日志中,都没有此类异常可见。
建议尝试使用 storm.yaml 中的配置设置(“ worker.childopts”)增加JVM内存。
任何帮助将不胜感激。
P.S。几天以来一直试图找出解决方案,但没有成功。