准备就绪探针在管理代理产品上失败

时间:2020-06-13 17:26:50

标签: deployment azure-aks sql-server-2019

我正在尝试在AKS上设置SQLServer BDC,但是该过程似乎并没有超出特定的范围。 AKS群集是基于Standard_E8_v3 VM ScaleSet构建的3节点群集。

以下是广告连播列表:C:\Users\rgn>kubectl get pods -n mssql-cluster

NAME              READY   STATUS    RESTARTS   AGE
control-qm754     3/3     Running   0          35m
controldb-0       2/2     Running   0          35m
controlwd-wxrlg   1/1     Running   0          32m
logsdb-0          1/1     Running   0          32m
logsui-mqfcv      1/1     Running   0          32m
metricsdb-0       1/1     Running   0          32m
metricsdc-9frbb   1/1     Running   0          32m
metricsdc-jr5hk   1/1     Running   0          32m
metricsdc-ls7mf   1/1     Running   0          32m
metricsui-pn9qf   1/1     Running   0          32m
mgmtproxy-x4ctb   2/2     Running   0          32m

当我针对mgmtproxy-x4ctb pod运行描述时,下面是我所看到的。并且即使该状态表明它正在运行,它也没有(就绪探针失败)。我相信这就是部署无法进行的原因。

Events:
  Type     Reason     Age                From                                        Message
  ----     ------     ----               ----                                        -------
  Normal   Scheduled  11m                default-scheduler                           Successfully assigned mssql-cluster/mgmtproxy-x4ctb to aks-agentpool-34156060-vmss000002
  Normal   Pulling    11m                kubelet, aks-agentpool-34156060-vmss000002  Pulling image "mcr.microsoft.com/mssql/bdc/mssql-service-proxy:2019-CU4-ubuntu-16.04"
  Normal   Pulled     11m                kubelet, aks-agentpool-34156060-vmss000002  Successfully pulled image "mcr.microsoft.com/mssql/bdc/mssql-service-proxy:2019-CU4-ubuntu-16.04"
  Normal   Created    11m                kubelet, aks-agentpool-34156060-vmss000002  Created container service-proxy
  Normal   Started    11m                kubelet, aks-agentpool-34156060-vmss000002  Started container service-proxy
  Normal   Pulling    11m                kubelet, aks-agentpool-34156060-vmss000002  Pulling image "mcr.microsoft.com/mssql/bdc/mssql-monitor-fluentbit:2019-CU4-ubuntu-16.04"
  Normal   Pulled     11m                kubelet, aks-agentpool-34156060-vmss000002  Successfully pulled image "mcr.microsoft.com/mssql/bdc/mssql-monitor-fluentbit:2019-CU4-ubuntu-16.04"
  Normal   Created    11m                kubelet, aks-agentpool-34156060-vmss000002  Created container fluentbit
  Normal   Started    11m                kubelet, aks-agentpool-34156060-vmss000002  Started container fluentbit
  Warning  Unhealthy  10m (x6 over 11m)  kubelet, aks-agentpool-34156060-vmss000002  Readiness probe failed: cat: /var/run/container.ready: No such file or directory

我尝试了两次,但是两次都无法超越这一点。从the link起,看来这个问题仅在上个月才存在。有人可以指出我正确的方向吗?

来自代理窗格的日志列表:

2020/06/13 16:25:35 Setting the directories for 'agent:agent' owner with '-rwxrwxr-x' mode: [/var/opt /var/log /var/run/secrets /var/run/secrets/keytabs /var/run/secrets/certificates /var/run/secrets/credentials /var/opt/agent /var/log/agent /var/run/agent]
2020/06/13 16:25:35 Setting the directories for 'agent:agent' owner with '-rwxrwx---' mode: [/var/opt/agent /var/log/agent /var/run/agent]
2020/06/13 16:25:35 Searching agent configuration file at /opt/agent/conf/mgmtproxy.json
2020/06/13 16:25:35 Searching agent configuration file at /opt/agent/conf/agent.json
2020/06/13 16:25:35.777955 Changed the container umask from '-----w--w-' to '--------w-'
2020/06/13 16:25:35.778031 Setting the directories for 'supervisor:supervisor' owner with '-rwxrwx---' mode: [/var/log/supervisor/log /var/opt/supervisor /var/log/supervisor /var/run/supervisor]
2020/06/13 16:25:35.778170 Setting the directories for 'fluentbit:fluentbit' owner with '-rwxrwx---' mode: [/var/opt/fluentbit /var/log/fluentbit /var/run/fluentbit]
2020/06/13 16:25:35.778411 Agent configuration: {"PodType":"mgmtproxy","ContainerName":"fluentbit","GrpcPort":8311,"HttpsPort":8411,"ScaledSetKind":"ReplicaSet","securityPolicy":"certificate","dnsServicesToWaitFor":null,"cronJobs":null,"serviceJobs":null,"healthModules":null,"logRotation":{"agentLogMaxSize":500,"agentLogRotateCount":3,"serviceLogRotateCount":10},"fileMap":{"fluentbit-certificate.pem":"/var/run/secrets/certificates/fluentbit/fluentbit-certificate.pem","fluentbit-privatekey.pem":"/var/run/secrets/certificates/fluentbit/fluentbit-privatekey.pem","krb5.conf":"/etc/krb5.conf","nsswitch.conf":"/etc/nsswitch.conf","resolv.conf":"/etc/resolv.conf","smb.conf":"/etc/samba/smb.conf"},"userPermissions":{"agent":{"user":"agent","group":"agent","mode":"0770","modeSetgid":false,"directories":[]},"fluentbit":{"user":"fluentbit","group":"","mode":"","modeSetgid":false,"directories":[]},"fundamental":{"user":"agent","group":"agent","mode":"0775","modeSetgid":false,"directories":["/var/opt","/var/log","/var/run/secrets","/var/run/secrets/keytabs","/var/run/secrets/certificates","/var/run/secrets/credentials"]},"supervisor":{"user":"supervisor","group":"supervisor","mode":"0770","modeSetgid":false,"directories":["/var/log/supervisor/log"]}},"fileIgnoreList":["agent-certificate.pem","agent-privatekey.pem"],"InstanceId":"t4KLx1m5vDsHCHc038KgKHH5HOcQVR0Z","ContainerId":"","StartServicesImmediately":false,"DisableFileDownloads":false,"DisableHealthChecks":false,"serviceFencingEnabled":false,"isPrivileged":true,"IsConfigurationManagerEnabled":false,"LWriter":{"filename":"/var/log/agent/agent.log","maxsize":500,"maxage":0,"maxbackups":10,"localtime":true,"compress":false}}
2020/06/13 16:25:36.316209 Attempting to join cluster...
2020/06/13 16:25:36.316301 Source directory /var/opt/secrets/certificates/ca does not exist
2020/06/13 16:25:36.316520 [Reaper] Starting the signal loop for reaper
2020/06/13 16:25:40.642164 [Reaper] Received SIGCHLD signal. Starting process reaper.
2020/06/13 16:25:40.652703 Starting secure gRPC listener on 0.0.0.0:8311
2020/06/13 16:25:40.943805 Cluster join successful.
2020/06/13 16:25:40.943846 Stopping gRPC listener on 0.0.0.0:8311
2020/06/13 16:25:40.944704 Getting manifest from controller...
2020/06/13 16:25:40.964774 Downloading '/config/scaledsets/mgmtproxy/containers/fluentbit/files/fluentbit-certificate.pem' from controller...
2020/06/13 16:25:40.964816 Downloading '/config/scaledsets/mgmtproxy/containers/fluentbit/files/fluentbit-privatekey.pem' from controller...
2020/06/13 16:25:40.987309 Stored 1206 bytes to /var/run/secrets/certificates/fluentbit/fluentbit-certificate.pem
2020/06/13 16:25:40.992108 Stored 1694 bytes to /var/run/secrets/certificates/fluentbit/fluentbit-privatekey.pem
2020/06/13 16:25:40.992235 Agent is ready.
2020/06/13 16:25:40.992348 Starting supervisord with command: '[supervisord --nodaemon -c /etc/supervisord.conf]'
2020/06/13 16:25:40.992719 Started supervisord with pid=1437
2020/06/13 16:25:40.993030 Starting secure gRPC listener on 0.0.0.0:8311
2020/06/13 16:25:40.996580 Starting HTTPS listener on 0.0.0.0:8411
2020/06/13 16:25:41.998667 [READINESS] Not all supervisord processes are ready. Attempts: 1, Max attempts: 250
2020/06/13 16:25:41.999567 Loading go plugin plugins/bdc.so
2020/06/13 16:25:41.999588 Loading go plugin plugins/platform.so
2020/06/13 16:25:41.999600 Starting the health monitoring, number of modules: 2, services: ["fluentbit","agent"]
2020/06/13 16:25:41.999605 Starting the health service
2020/06/13 16:25:41.999609 Starting the health durable store
2020/06/13 16:25:41.999614 Loading existing health properties from /var/opt/agent/health/health-properties-main.gob
2020/06/13 16:25:41.999642 No existing file path for file: /var/opt/agent/health/health-properties-main.gob
2020/06/13 16:25:42.640719 Adding a new plugin plugins/bdc.so 
2020/06/13 16:25:43.302872 Adding a new plugin plugins/platform.so 
2020/06/13 16:25:43.302932 Created a health module watcher for service 'fluentbit'
2020/06/13 16:25:43.302948 Starting a new watcher for health module: fluentbit 
2020/06/13 16:25:43.302983 Starting a new watcher for health module: agent 
2020/06/13 16:25:43.302992 Health monitoring started
2020/06/13 16:25:53.000908 [READINESS] All services marked as ready.
2020/06/13 16:25:53.000966 [READINESS] Container is now ready.
2020/06/13 16:26:01.995093 [MONITOR] Service states: map[fluentbit:RUNNING]

1 个答案:

答案 0 :(得分:0)

全部

最后它被整理了。

关于我们的天蓝色政策和我们的网络政策,存在几个问题。

(1) It was not allowing new IP addresses to be assigned to the loadbalancer. 
(2) The gateway proxy was not getting new IP Addresses since we ran out of our quota of 10  max IPs that were allowed. 
(3) My desktop from where I started to deploy was not able to ping the controller service IP addresses and Port.

我们一个接一个地解决了上述问题,我们已经进入最后阶段。

鉴于IP地址是静态的,但却是即时生成的,因此无法进行配置。其他人如何与他们的网络/天蓝色基础架构团队一起解决这个问题?

谢谢, rgn