在不破坏群集的情况下重新安装DCOS Master

时间:2019-01-16 17:22:55

标签: bigdata cluster-computing master mesosphere dcos

我有一个已安装的DCOS集群,包含3个主节点和3个从节点,在其中一个主节点获得/ var的磁盘空间为100%之前,它工作正常,“ dcos auth login”停止工作,并且我收到一条错误消息,内容为“尝试使用GUI登录时发生错误。

空间已从/ var释放,现在容量为84%。但是问题仍然存在,等待了很长时间之后,我尝试重新启动chronyd服务和dcos.target服务,但这无济于事。

现在我有3个具有20个服务的“大师”,状态为“激活”,见下文:

[id@cluster ~]$ sudo systemctl | grep dcos | grep activating
dcos-adminrouter.service                                                                                   loaded activating auto-restart       Admin Router Master: exposes a unified control plane proxy for components and services using NGINX
dcos-backup-master.service                                                                                 loaded activating start-pre    start DC/OS Backup Master: backup & restore service
dcos-bouncer.service                                                                                       loaded activating auto-restart       DC/OS Identity and Access Manager (Bouncer): controls access to DC/OS components and services by managing users, user groups, service accounts, permissions, and identity providers
dcos-ca.service                                                                                            loaded activating start-pre    start DC/OS Certificate Authority: issues signed digital certificates for secure communication
dcos-cluster-linker.service                                                                                loaded activating auto-restart       DC/OS Cluster Linker Service: service for DC/OS Cluster Linker
dcos-cockroach.service                                                                                     loaded activating auto-restart       CockroachDB: Database for the DC/OS IAM
dcos-cosmos.service                                                                                        loaded activating auto-restart       DC/OS Package Manager (Cosmos): installs and manages DC/OS packages from DC/OS package repositories, such as the Mesosphere Universe
dcos-diagnostics.service                                                                                   loaded activating auto-restart       DC/OS Diagnostics Master: aggregates and exposes component health
dcos-history.service                                                                                       loaded activating auto-restart       DC/OS History: caches and exposes historical system state
dcos-licensing.service                                                                                     loaded activating auto-restart       DC/OS Licensing: licensing audit service
dcos-log-master.service                                                                                    loaded activating auto-restart       DC/OS Log Master: exposes master node and component logs
dcos-marathon.service                                                                                      loaded activating auto-restart       Marathon: container orchestration engine
dcos-mesos-dns.service                                                                                     loaded activating start-pre    start Mesos DNS: domain name based service discovery
dcos-mesos-master.service                                                                                  loaded activating start-pre    start Mesos Master: distributed systems kernel
dcos-metrics-master.service                                                                                loaded activating auto-restart       DC/OS Metrics Master: exposes node metrics
dcos-metronome.service                                                                                     loaded activating auto-restart       DC/OS Jobs (Metronome): job orchestration
dcos-net.service                                                                                           loaded activating auto-restart       DC/OS Net: A distributed systems & network overlay orchestration engine
dcos-secrets.service                                                                                       loaded activating auto-restart       DC/OS Secrets: provides a secure API for storing and retrieving secrets from Vault, a secret store
dcos-signal.service                                                                                        loaded activating auto-restart       DC/OS Signal: reports cluster telemetry and analytics to help improve DC/OS
dcos-vault.service 

我尝试重新启动服务,但根本没有帮助,所以我现在想尝试重新安装所有这些主服务器,以节省故障排除时间。

重新启动任何服务时出现此错误:

[id@cluster ~]$ sudo systemctl restart  dcos-mesos-master.service
Job for dcos-mesos-master.service failed because the control process exited with error code. See "systemctl status dcos-mesos-master.service" and "journalctl -xe" for details.

群集中没有数据,因此它是全新的,但是已经安装了3个从站,并且工作正常,我的问题是,重新安装Master会要求我也重新安装从站吗? 这样做有多糟糕?

0 个答案:

没有答案