我有一个已安装的DCOS集群,包含3个主节点和3个从节点,在其中一个主节点获得/ var的磁盘空间为100%之前,它工作正常,“ dcos auth login”停止工作,并且我收到一条错误消息,内容为“尝试使用GUI登录时发生错误。
空间已从/ var释放,现在容量为84%。但是问题仍然存在,等待了很长时间之后,我尝试重新启动chronyd服务和dcos.target服务,但这无济于事。
现在我有3个具有20个服务的“大师”,状态为“激活”,见下文:
[id@cluster ~]$ sudo systemctl | grep dcos | grep activating
dcos-adminrouter.service loaded activating auto-restart Admin Router Master: exposes a unified control plane proxy for components and services using NGINX
dcos-backup-master.service loaded activating start-pre start DC/OS Backup Master: backup & restore service
dcos-bouncer.service loaded activating auto-restart DC/OS Identity and Access Manager (Bouncer): controls access to DC/OS components and services by managing users, user groups, service accounts, permissions, and identity providers
dcos-ca.service loaded activating start-pre start DC/OS Certificate Authority: issues signed digital certificates for secure communication
dcos-cluster-linker.service loaded activating auto-restart DC/OS Cluster Linker Service: service for DC/OS Cluster Linker
dcos-cockroach.service loaded activating auto-restart CockroachDB: Database for the DC/OS IAM
dcos-cosmos.service loaded activating auto-restart DC/OS Package Manager (Cosmos): installs and manages DC/OS packages from DC/OS package repositories, such as the Mesosphere Universe
dcos-diagnostics.service loaded activating auto-restart DC/OS Diagnostics Master: aggregates and exposes component health
dcos-history.service loaded activating auto-restart DC/OS History: caches and exposes historical system state
dcos-licensing.service loaded activating auto-restart DC/OS Licensing: licensing audit service
dcos-log-master.service loaded activating auto-restart DC/OS Log Master: exposes master node and component logs
dcos-marathon.service loaded activating auto-restart Marathon: container orchestration engine
dcos-mesos-dns.service loaded activating start-pre start Mesos DNS: domain name based service discovery
dcos-mesos-master.service loaded activating start-pre start Mesos Master: distributed systems kernel
dcos-metrics-master.service loaded activating auto-restart DC/OS Metrics Master: exposes node metrics
dcos-metronome.service loaded activating auto-restart DC/OS Jobs (Metronome): job orchestration
dcos-net.service loaded activating auto-restart DC/OS Net: A distributed systems & network overlay orchestration engine
dcos-secrets.service loaded activating auto-restart DC/OS Secrets: provides a secure API for storing and retrieving secrets from Vault, a secret store
dcos-signal.service loaded activating auto-restart DC/OS Signal: reports cluster telemetry and analytics to help improve DC/OS
dcos-vault.service
我尝试重新启动服务,但根本没有帮助,所以我现在想尝试重新安装所有这些主服务器,以节省故障排除时间。
重新启动任何服务时出现此错误:
[id@cluster ~]$ sudo systemctl restart dcos-mesos-master.service
Job for dcos-mesos-master.service failed because the control process exited with error code. See "systemctl status dcos-mesos-master.service" and "journalctl -xe" for details.
群集中没有数据,因此它是全新的,但是已经安装了3个从站,并且工作正常,我的问题是,重新安装Master会要求我也重新安装从站吗? 这样做有多糟糕?