设置详细信息:
我们有Flink,它在Apache Kafka的帮助下执行实时数据分析,并且为了容错,我们有hadoop,它将负责检查点和保存点。
Kafka-Flink-Hadoop都是集群的。
我们正在使用docker swarm运行所有设置。
用于flink集群的Docker swarm yml文件:
version: '3.2'
services:
jobmanager:
image: flink:1.7-hadoop28-alpine
hostname: jobmanager
ports:
- target: 8081
published: 8081
protocol: tcp
mode: host
deploy:
placement:
constraints: [node.ip == host1]
endpoint_mode: dnsrr
command: jobmanager
volumes:
- /etc/flink-cep:/etc/flink-cep
environment:
- JOB_MANAGER_RPC_ADDRESS=jobmanager
taskmanager1:
image: flink:1.7-hadoop28-alpine
hostname: taskmanager1
deploy:
placement:
constraints: [node.ip == host1]
endpoint_mode: dnsrr
command: taskmanager
volumes:
- /etc/flink-cep:/etc/flink-cep
environment:
- JOB_MANAGER_RPC_ADDRESS=jobmanager
taskmanager2:
image: flink:1.7-hadoop28-alpine
hostname: taskmanager2
deploy:
placement:
constraints: [node.ip == host2]
endpoint_mode: dnsrr
command: taskmanager
volumes:
- /etc/flink-cep:/etc/flink-cep
environment:
- JOB_MANAGER_RPC_ADDRESS=jobmanager
taskmanager3:
image: flink:1.7-hadoop28-alpine
hostname: taskmanager3
deploy:
placement:
constraints: [node.ip == host3]
endpoint_mode: dnsrr
command: taskmanager
volumes:
- /etc/flink-cep:/etc/flink-cep
environment:
- JOB_MANAGER_RPC_ADDRESS=jobmanager
问题陈述:
该flink作业崩溃后,经常发生超时。
在yml文件上方是否有任何修改以处理timout场景?
有什么解决方法吗?
请不要将其标记为重复项,因为我没有正确的答案。
参考:
我们在docker官方页面上发现了这个未解决的问题。
https://success.docker.com/article/ipvs-connection-timeout-issue