如何从Nomad运行cassandra docker容器?

时间:2018-12-06 14:46:19

标签: docker cassandra nomad

我想从一个游牧工作中运行一个卡桑德拉容器。它似乎开始了,但几秒钟后就死了(它似乎被游牧民族杀死了。)

如果我从命令行运行容器,则使用:

docker run --name some-cassandra -p 9042:9042 -d cassandra:3.0

容器完美启动。但是,如果我创建这样的游牧工作:

job "cassandra" {

  datacenters = ["dc1"]

  type = "service"

  update {
    max_parallel = 1
    min_healthy_time = "10s"
    healthy_deadline = "5m"
    progress_deadline = "10m"
    auto_revert = false
    canary = 0
  }

  migrate {
    max_parallel = 1
    health_check = "checks"
    min_healthy_time = "10s"
    healthy_deadline = "5m"
  }

  group "cassandra" {
    restart {
      attempts = 2
      interval = "240s"
      delay = "120s"
      mode = "delay"
    }

    task "cassandra" {
      driver = "docker"

      config {
        image = "cassandra:3.0"
        network_mode = "bridge"
        port_map {
          cql = 9042
        }
      }

      resources {
        memory = 2048
        cpu = 800
        network {
          port "cql" {}
        }
      }

      env {
        CASSANDRA_LISTEN_ADDRESS = "${NOMAD_IP_cql}"
      }

      service {
        name = "cassandra"
        tags = ["global", "cassandra"]
        port = "cql"
      }
    }
  }
}

然后它将永远不会开始。游牧民族的Web界面在创建的分配的 stdout 日志中不显示任何内容,而 stdin 流仅显示 Killed

我知道在这种情况下,会创建docker容器,并在几秒钟后将其删除。我无法读取这些容器的日志,因为当我尝试使用docker logs <container_id>时,我得到的只是:

Error response from daemon: configured logging driver does not support reading

并且分配概览显示以下消息:

12/06/18 14:16:04   Terminated  Exit Code: 137, Exit Message: "Docker container exited with non-zero exit code: 137"

根据docker

  

如果在容器启动时没有初始化数据库,则   将创建默认数据库。虽然这是预期的行为,   这意味着它将不会接受传入的连接,直到   初始化完成。使用自动化时,这可能会导致问题   启动多个容器的工具,例如docker-compose   同时。

但是我怀疑这是问题的根源,因为我增加了restart节的值没有任何效果,并且因为任务在几秒钟后失败了。

不久前,我遇到了一个类似的问题,一个kafka容器-事实证明-不满意,因为它需要更多的内存。但是在这种情况下,我在resources节中为内存和CPU提供了足够的值,但这似乎没有什么区别。

我的主机操作系统是Arch Linux,其内核为4.19.4-arch1-1-ARCH。我正在将 consul 作为 systemd 服务运行,并且使用以下命令行运行Nomad代理:

sudo nomad agent -dev

我可能会缺少什么?任何帮助和/或指示,将不胜感激。

更新(2018-12-06 16:26 GMT):通过详细阅读Nomad代理的输出,我可以在主机的{{1} } 目录。该输出的摘要:

/tmp

但是 2018/12/06 16:03:03 [DEBUG] memberlist: TCP connection from=127.0.0.1:45792 2018/12/06 16:03:03.180586 [DEBUG] driver.docker: docker pull cassandra:latest succeeded 2018-12-06T16:03:03.184Z [DEBUG] plugin: starting plugin: path=/usr/bin/nomad args="[/usr/bin/nomad executor {"LogFile":"/tmp/NomadClient073551030/1c315bf2-688c-2c7b-8d6f-f71fec1254f3/cassandra/executor.out","LogLevel":"DEBUG"}]" 2018-12-06T16:03:03.185Z [DEBUG] plugin: waiting for RPC address: path=/usr/bin/nomad 2018-12-06T16:03:03.235Z [DEBUG] plugin.nomad: plugin address: timestamp=2018-12-06T16:03:03.235Z address=/tmp/plugin681788273 network=unix 2018/12/06 16:03:03.253166 [DEBUG] driver.docker: Setting default logging options to syslog and unix:///tmp/plugin559865372 2018/12/06 16:03:03.253196 [DEBUG] driver.docker: Using config for logging: {Type:syslog ConfigRaw:[] Config:map[syslog-address:unix:///tmp/plugin559865372]} 2018/12/06 16:03:03.253206 [DEBUG] driver.docker: using 2147483648 bytes memory for cassandra 2018/12/06 16:03:03.253217 [DEBUG] driver.docker: using 800 cpu shares for cassandra 2018/12/06 16:03:03.253237 [DEBUG] driver.docker: binding directories []string{"/tmp/NomadClient073551030/1c315bf2-688c-2c7b-8d6f-f71fec1254f3/alloc:/alloc", "/tmp/NomadClient073551030/1c315bf2-688c-2c7b-8d6f-f71fec1254f3/cassandra/local:/local", "/tmp/NomadClient073551030/1c315bf2-688c-2c7b-8d6f-f71fec1254f3/cassandra/secrets:/secrets"} for cassandra 2018/12/06 16:03:03.253282 [DEBUG] driver.docker: allocated port 127.0.0.1:29073 -> 9042 (mapped) 2018/12/06 16:03:03.253296 [DEBUG] driver.docker: exposed port 9042 2018/12/06 16:03:03.253320 [DEBUG] driver.docker: setting container name to: cassandra-1c315bf2-688c-2c7b-8d6f-f71fec1254f3 2018/12/06 16:03:03.361162 [INFO] driver.docker: created container 29b0764bd2de69bda6450ebb1a55ffd2cbb4dc3002f961cb5db71b323d611199 2018/12/06 16:03:03.754476 [INFO] driver.docker: started container 29b0764bd2de69bda6450ebb1a55ffd2cbb4dc3002f961cb5db71b323d611199 2018/12/06 16:03:03.757642 [DEBUG] consul.sync: registered 1 services, 0 checks; deregistered 0 services, 0 checks 2018/12/06 16:03:03.765001 [DEBUG] client: error fetching stats of task cassandra: stats collection hasn't started yet 2018/12/06 16:03:03.894514 [DEBUG] client: updated allocations at index 371 (total 2) (pulled 0) (filtered 2) 2018/12/06 16:03:03.894584 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 2) 2018/12/06 16:03:05.190647 [DEBUG] driver.docker: error collecting stats from container 29b0764bd2de69bda6450ebb1a55ffd2cbb4dc3002f961cb5db71b323d611199: io: read/write on closed pipe 2018-12-06T16:03:09.191Z [DEBUG] plugin.nomad: 2018/12/06 16:03:09 [ERR] plugin: plugin server: accept unix /tmp/plugin681788273: use of closed network connection 2018-12-06T16:03:09.194Z [DEBUG] plugin: plugin process exited: path=/usr/bin/nomad 2018/12/06 16:03:09.223734 [INFO] client: task "cassandra" for alloc "1c315bf2-688c-2c7b-8d6f-f71fec1254f3" failed: Wait returned exit code 137, signal 0, and error Docker container exited with non-zero exit code: 137 2018/12/06 16:03:09.223802 [INFO] client: Restarting task "cassandra" for alloc "1c315bf2-688c-2c7b-8d6f-f71fec1254f3" in 2m7.683274502s 2018/12/06 16:03:09.230053 [DEBUG] consul.sync: registered 0 services, 0 checks; deregistered 1 services, 0 checks 2018/12/06 16:03:09.233507 [DEBUG] consul.sync: registered 0 services, 0 checks; deregistered 0 services, 0 checks 2018/12/06 16:03:09.296185 [DEBUG] client: updated allocations at index 372 (total 2) (pulled 0) (filtered 2) 2018/12/06 16:03:09.296313 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 2) 2018/12/06 16:03:11.541901 [DEBUG] http: Request GET /v1/agent/health?type=client (452.678µs) 的内容看似简单:

/tmp/NomadClient.../<alloc_id>/...

[root@singularity 1c315bf2-688c-2c7b-8d6f-f71fec1254f3]# ls -lR .: total 0 drwxrwxrwx 5 nobody nobody 100 Dec 6 15:52 alloc drwxrwxrwx 5 nobody nobody 120 Dec 6 15:53 cassandra ./alloc: total 0 drwxrwxrwx 2 nobody nobody 40 Dec 6 15:52 data drwxrwxrwx 2 nobody nobody 80 Dec 6 15:53 logs drwxrwxrwx 2 nobody nobody 40 Dec 6 15:52 tmp ./alloc/data: total 0 ./alloc/logs: total 0 -rw-r--r-- 1 root root 0 Dec 6 15:53 cassandra.stderr.0 -rw-r--r-- 1 root root 0 Dec 6 15:53 cassandra.stdout.0 ./alloc/tmp: total 0 ./cassandra: total 4 -rw-r--r-- 1 root root 1248 Dec 6 16:19 executor.out drwxrwxrwx 2 nobody nobody 40 Dec 6 15:52 local drwxrwxrwx 2 nobody nobody 60 Dec 6 15:52 secrets drwxrwxrwt 2 nobody nobody 40 Dec 6 15:52 tmp ./cassandra/local: total 0 ./cassandra/secrets: total 0 ./cassandra/tmp: total 0 cassandra.stdout.0均为空,cassandra.stderr.0文件的完整内容为:

executor.out

更新(2018-12-06 16:40 GMT):由于很显然代理希望将日志记录到syslog,因此我已经设置并启动了本地syslog服务器,但无济于事。而且syslog服务器什么也不会收到消息。

1 个答案:

答案 0 :(得分:1)

问题解决了。它的性质是双重的:

  • Nomad的docker驱动程序(非常有效地)封装了 容器的行为,使它们有时非常保持沉默。

  • Cassandra 非常需要资源。比我更多 本来以为。我坚信4 GB RAM足以满足 它可以舒适地运行,但事实证明它是需要的(至少在我 环境)6 GB。

免责声明:我现在实际上使用的是bitnami/cassandra而不是cassandra,因为我相信它们的图像质量很高,安全并且可以通过环境变量进行配置。我是使用bitnami的图像进行这项发现的,但还没有测试原始图像对拥有如此大的内存量的反应。

关于为什么直接从Docker的CLI运行容器时它不会失败的原因,我认为这是因为以这种方式运行容器时没有限制的规定。 Docker只需占用其容器所需的尽可能多的内存,因此,如果最终主机的内存不足以容纳所有容器,那么实现将要晚得多(可能会很痛苦)。因此,早期失败应该是业务流程平台作为游牧者的可喜收益。如果我有任何抱怨,那就是由于容器的可见性太差而导致查找问题花了很长时间!