带有Docker Swarm的Spring Boot Admin无法提取运行状况指标

时间:2018-12-31 04:48:24

标签: spring-boot docker docker-swarm spring-boot-actuator spring-boot-admin

我正在尝试使用Zookeeper发现服务机制让Spring Boot管理员在docker swarm集群中工作,以确保一旦连接到zookeeper,就可以动态发现所有客户端。问题是,即使所有docker服务都使用相同的覆盖网络,并且连接通过docker exec验证,每个容器可以互相ping通,但由于连接被拒绝,springboot admin无法到达客户端上的健康执行器端点。确保彼此之间都是可以访问的。

我还验证了客户端和admin服务已正确连接到zookeeper,并且zookeeper + admin仪表盘实际上已在查看那些客户端已注册。

为了重新创建此问题,我创建了一个简单的docker compose,它通过以下compose文件在同一覆盖网络上部署了两个启用了执行器的spring boot管理应用程序:

version: '3.1'

services:
    zoo1:
        image: zookeeper:3.4.12
        hostname: zoo1
        networks:
            - nsp_test
        deploy:
            restart_policy:
                condition: on-failure
            placement:
                constraints: [node.hostname == nj51nreda5v]
        environment:
            ZOO_MY_ID: 1
            ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888

    zoo2:
        image: zookeeper:3.4.12
        hostname: zoo2
        networks:
            - nsp_test
        deploy:
            restart_policy:
                condition: on-failure
            placement:
                constraints: [node.hostname == nj51nreda6v]
        environment:
            ZOO_MY_ID: 2
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888

    nspadmin:
        image: admin:77
        ports:
            - "9084:8080"
        networks:
            - nsp_test
        depends_on:
            - "zoo1"
            - "zoo2"
        deploy:
            restart_policy:
                condition: on-failure
            mode: global
        environment:
            ZK_HOST: zoo1:2181,zoo2:2182
            SPRING_PROFILES_ACTIVE: ssldev
networks:
    nsp_test:
      external:
        name: nsp_test

从这种配置中,我看到两个弹簧管理仪表板都已在zookeeper中注册并显示为OFFLINE(因为它无法到达/ health执行器)

以下两个地址是它在SBA中为客户端注册的地址。 https://10.255.0.19:8080/离线 https://10.255.0.20:8080/离线

我得到的例外。

2018-12-31 04:20:31.926  INFO 1 --- [    updateTask1] d.c.boot.admin.registry.StatusUpdater    : Couldn't retrieve status for Application [id=28eab1e1, name=nsp-admin, managementUrl=https://10.255.0.20:8080/, healthUrl=https://10.255.0.20:8080/health, serviceUrl=https://10.255.0.20:8080/]
org.springframework.web.client.ResourceAccessException: I/O error on GET request for "https://10.255.0.20:8080/health": Connect to 10.255.0.20:8080 [/10.255.0.20] failed: connect timed out; nested exception is org.apache.http.conn.ConnectTimeoutException: Connect to 10.255.0.20:8080 [/10.255.0.20] failed: connect timed out
        at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:666) ~[spring-web-4.3.8.RELEASE.jar!/:4.3.8.RELEASE]
        at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:628) ~[spring-web-4.3.8.RELEASE.jar!/:4.3.8.RELEASE]
        at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:549) ~[spring-web-4.3.8.RELEASE.jar!/:4.3.8.RELEASE]
        at de.codecentric.boot.admin.web.client.ApplicationOperations.doGet(ApplicationOperations.java:68) ~[spring-boot-admin-server-1.5.6.jar!/:1.5.6]
        at de.codecentric.boot.admin.web.client.ApplicationOperations.getHealth(ApplicationOperations.java:58) ~[spring-boot-admin-server-1.5.6.jar!/:1.5.6]
        at de.codecentric.boot.admin.registry.StatusUpdater.queryStatus(StatusUpdater.java:111) [spring-boot-admin-server-1.5.6.jar!/:1.5.6]
        at de.codecentric.boot.admin.registry.StatusUpdater.updateStatus(StatusUpdater.java:65) [spring-boot-admin-server-1.5.6.jar!/:1.5.6]
        at de.codecentric.boot.admin.registry.StatusUpdateApplicationListener$1.run(StatusUpdateApplicationListener.java:47) [spring-boot-admin-server-1.5.6.jar!/:1.5.6]
        at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) [spring-context-4.3.8.RELEASE.jar!/:4.3.8.RELEASE]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_151]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_151]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_151]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]
Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to 10.255.0.20:8080 [/10.255.0.20] failed: connect timed out
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151) ~[httpclient-4.5.3.jar!/:4.5.3]
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:359) ~[httpclient-4.5.3.jar!/:4.5.3]
        at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381) ~[httpclient-4.5.3.jar!/:4.5.3]
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237) ~[httpclient-4.5.3.jar!/:4.5.3]
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) ~[httpclient-4.5.3.jar!/:4.5.3]
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) ~[httpclient-4.5.3.jar!/:4.5.3]
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111) ~[httpclient-4.5.3.jar!/:4.5.3]
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[httpclient-4.5.3.jar!/:4.5.3]
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[httpclient-4.5.3.jar!/:4.5.3]
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[httpclient-4.5.3.jar!/:4.5.3]
        at org.springframework.http.client.HttpComponentsClientHttpRequest.executeInternal(HttpComponentsClientHttpRequest.java:89) ~[spring-web-4.3.8.RELEASE.jar!/:4.3.8.RELEASE]
        at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48) ~[spring-web-4.3.8.RELEASE.jar!/:4.3.8.RELEASE]
        at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:53) ~[spring-web-4.3.8.RELEASE.jar!/:4.3.8.RELEASE]
        at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:652) ~[spring-web-4.3.8.RELEASE.jar!/:4.3.8.RELEASE]
        ... 15 common frames omitted
Caused by: java.net.SocketTimeoutException: connect timed out
        at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.8.0_151]
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[na:1.8.0_151]
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[na:1.8.0_151]
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[na:1.8.0_151]
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[na:1.8.0_151]
        at java.net.Socket.connect(Socket.java:589) ~[na:1.8.0_151]
        at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:339) ~[httpclient-4.5.3.jar!/:4.5.3]
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142) ~[httpclient-4.5.3.jar!/:4.5.3]

我的SBA配置yml

server:
  port: 8080
spring:
  boot:
    admin:
      client:
        prefer-ip: false
  datasource:
    driverClassName: org.postgresql.Driver
    url: ${DB_URL}
    username: ${DB_USER}
    password: ${DB_PASSWORD}
  application:
    name: nsp-admin
  cloud:
    config:
      discovery:
        enabled: true
    zookeeper:
      connect-string: ${ZK_HOST}
      discovery:
        uri-spec: https://{address}:{port}
        metadata:
          management:
            context-path: /
          health:
            path: /health

management:
  security:
    enabled: false

security:
  basic:
    enabled: false

#security.require-ssl: true
server.ssl.enabled: true
server.ssl.key-store-type: PKCS12
server.ssl.key-store: *****
server.ssl.key-store-password: *****

更新 在对问题进行更多调试之后,我意识到毫无疑问,这与客户端在zookeeper中注册的主机名/ IP有关。

当我使用docker id作为主机名执行curl时,/ health api在执行从SBA到Client容器id的curl时返回。

这有效: docker exec -it 8403c5001b9e curl -k https://bf41c73af594:8080/health

这不起作用会导致超时:docker exec -it 8403c5001b9e curl -k https://10.255.0.20:8080/health

是否可以迫使Zookeeper注册主机名或容器ID?

更新 在我的application.yml中设置spring.cloud.zookeeper.discovery.instanceHost:$ {HOSTNAME}可以解决此问题。它将正确的containerId强制注册到zookeeper。

1 个答案:

答案 0 :(得分:0)

您不需要做所有这些马戏团。在Docker中,有一个称为服务发现的概念。这是由docker负责的本地DNS解析。

您可以使用容器名称,也可以指定别名而不是IP /容器ID,因为它们每次都会更改。

方法1:

默认情况下,docker将网络名称和服务名称添加到容器中。您可以通过在decker-compose中使用container_name关键字来为容器固定一个名称。然后,您可以使用该名称代替IP。这样可以解决各个容器。

示例撰写文件:

version: '3.1'

services:
    zoo1:
        image: zookeeper:3.4.12
        hostname: zoo1
        container_name: zoo1
        networks:
            - nsp_test
        deploy:
            restart_policy:
                condition: on-failure
            placement:
                constraints: [node.hostname == nj51nreda5v]
        environment:
            ZOO_MY_ID: 1
            ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888

    zoo2:
        image: zookeeper:3.4.12
        hostname: zoo2
        container_name: zoo2
        networks:
            - nsp_test
        deploy:
            restart_policy:
                condition: on-failure
            placement:
                constraints: [node.hostname == nj51nreda6v]
        environment:
            ZOO_MY_ID: 2
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888

    nspadmin:
        image: admin:77
        ports:
            - "9084:8080"
        networks:
            - nsp_test
        depends_on:
            - "zoo1"
            - "zoo2"
        deploy:
            restart_policy:
                condition: on-failure
            mode: global
        environment:
            ZK_HOST: zoo1:2181,zoo2:2182
            SPRING_PROFILES_ACTIVE: ssldev
networks:
    nsp_test:
      external:
        name: nsp_test

现在您可以以zoo1zoo2的身份访问 zoo1 zoo2 不适用于群体模式,因为container_name被忽略了

方法2 :(推荐用于docker swarm模式)

您可以为每个主机指定别名,并可以使用别名访问该服务。

示例撰写文件:

version: '3.1'
services:
    zoo1:
        image: zookeeper:3.4.12
        hostname: zoo1
        networks:
            default:
                aliases:
                    - zoo1
                    - zoo.1
        deploy:
            restart_policy:
                condition: on-failure
            placement:
                constraints: [node.hostname == nj51nreda5v]
        environment:
            ZOO_MY_ID: 1
            ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888

    zoo2:
        image: zookeeper:3.4.12
        hostname: zoo2
        networks:
            default:
                aliases:
                    - zoo2
                    - zoo.2
        deploy:
            restart_policy:
                condition: on-failure
            placement:
                constraints: [node.hostname == nj51nreda6v]
        environment:
            ZOO_MY_ID: 2
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888

    nspadmin:
        image: admin:77
        ports:
            - "9084:8080"
        networks:
            - nsp_test
        depends_on:
            - "zoo1"
            - "zoo2"
        deploy:
            restart_policy:
                condition: on-failure
            mode: global
        environment:
            ZK_HOST: zoo1:2181,zoo2:2182
            SPRING_PROFILES_ACTIVE: ssldev
networks:
    default:
      external:
        name: nsp_test

这里zoo1可以解析为zoo1zoo.1zoo1.nsp_testzoo.1.nsp_testzoo2也是如此。 也适用于群体模式。

方法3:

如果您知道要创建的服务的名称是什么,那么您也可以使用它来解析容器。

例如:

version: '3.1'
services:
    zoo1:
        image: zookeeper:3.4.12
        hostname: zoo1
        networks:
            - nsp_test
        deploy:
            restart_policy:
                condition: on-failure
            placement:
                constraints: [node.hostname == nj51nreda5v]
        environment:
            ZOO_MY_ID: 1
            ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888

    zoo2:
        image: zookeeper:3.4.12
        hostname: zoo2
        networks:
            - nsp_test
        deploy:
            restart_policy:
                condition: on-failure
            placement:
                constraints: [node.hostname == nj51nreda6v]
        environment:
            ZOO_MY_ID: 2
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888

    nspadmin:
        image: admin:77
        ports:
            - "9084:8080"
        networks:
            - nsp_test
        depends_on:
            - "zoo1"
            - "zoo2"
        deploy:
            restart_policy:
                condition: on-failure
            mode: global
        environment:
            ZK_HOST: zoo1:2181,zoo2:2182
            SPRING_PROFILES_ACTIVE: ssldev
networks:
    nsp_test:
      external:
        name: nsp_test

让我们假设上述配置创建名称为zoo1_nsp_testzoo2_nsp_test的容器。您也可以使用这些名称来解析容器。 不适用于群集节点,因为容器名称因主机而异。

注意:
以上所有方法仅在容器连接到同一网络时有效。

参考文献:

  1. Compose file version 3 reference#container_name
  2. Compose file version 3 reference#aliases
  3. service discovery
  4. Load balancing, service discovery and security