Question

我们在主节点上的origin-node.service失败，原因：

root@master> systemctl start origin-node.service
Job for origin-node.service failed because the control process exited with error code. See "systemctl status origin-node.service" and "journalctl -xe" for details.

root@master> systemctl status origin-node.service -l

[...]
May 05 07:17:47 master origin-node[44066]: bootstrap.go:195] Part of the existing bootstrap client certificate is expired: 2020-02-20 13:14:27 +0000 UTC
May 05 07:17:47 master origin-node[44066]: bootstrap.go:56] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
May 05 07:17:47 master origin-node[44066]: certificate_store.go:131] Loading cert/key pair from "/etc/origin/node/certificates/kubelet-client-current.pem".
May 05 07:17:47 master origin-node[44066]: server.go:262] failed to run Kubelet: cannot create certificate signing request: Post https://lb.openshift-cluster.mydomain.com:8443/apis/certificates.k8s.io/v1beta1/certificatesigningrequests: EOF

因此，似乎kubelet-client-current.pem和/或kubelet-server-current.pem包含已过期的证书，并且该服务尝试使用可能尚不可用的端点来创建CSR（因为主服务器是下）。我们尝试根据OpenShift文档Redeploying Certificates重新部署证书，但是在检测到过期的证书时失败：

root@master> ansible-playbook -i /etc/ansible/hosts  openshift-master/redeploy-openshift-ca.yml

[...]
TASK [openshift_certificate_expiry : Fail when certs are near or already expired] *******************************************************************************************************************************************
fatal: [master.openshift-cluster.mydomain.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 60 days of expiring. You may view the report at /root/cert-expiry-report.20200505T042754.html or /root/cert-expiry-report.20200505T042754.json.\n"}
[...]



root@master> cat /root/cert-expiry-report.20200505T042754.json

[...]
      "kubeconfigs": [
        {
          "cert_cn": "O:system:cluster-admins, CN:system:admin",
          "days_remaining": -75,
          "expiry": "2020-02-20 13:14:27",
          "health": "expired",
          "issuer": "CN=openshift-signer@1519045219 ",
          "path": "/etc/origin/node/node.kubeconfig",
          "serial": 27,
          "serial_hex": "0x1b"
        },
        {
          "cert_cn": "O:system:cluster-admins, CN:system:admin",
          "days_remaining": -75,
          "expiry": "2020-02-20 13:14:27",
          "health": "expired",
          "issuer": "CN=openshift-signer@1519045219 ",
          "path": "/etc/origin/node/node.kubeconfig",
          "serial": 27,
          "serial_hex": "0x1b"
        },
[...]

  "summary": {
    "expired": 2,
    "ok": 22,
    "total": 24,
    "warning": 0
  }
}

有Recovering from expired control plane certificates的OpenShift 4.4指南，但不适用于3.11，我们的版本没有找到这样的指南。

是否可以在没有运行3.11的主节点的情况下重新创建过期的证书？感谢您的帮助。

Ansible OpenShift：https://github.com/openshift/openshift-ansible/releases/tag/openshift-ansible-3.11.153-2

更新2020-05-06：我还执行了redeploy-certificates.yml，但是它在同一任务上失败：

root@master> ansible-playbook -i /etc/ansible/hosts playbooks/redeploy-certificates.yml

[...]

TASK [openshift_certificate_expiry : Fail when certs are near or already expired] ******************************************************************************
Wednesday 06 May 2020  04:07:06 -0400 (0:00:00.909)       0:01:07.582 ********* 
fatal: [master.openshift-cluster.mydomain.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 60 days of expiring. You may view the report at /root/cert-expiry-report.20200506T040603.html or /root/cert-expiry-report.20200506T040603.json.\n"}

更新2020-05-11：使用-e openshift_certificate_expiry_fail_on_warn=False运行会导致：

root@master> ansible-playbook -i /etc/ansible/hosts -e openshift_certificate_expiry_fail_on_warn=False playbooks/redeploy-certificates.yml

[...]

TASK [Wait for master API to come back online] *****************************************************************************************************************
Monday 11 May 2020  03:48:56 -0400 (0:00:00.111)       0:02:25.186 ************ 
skipping: [master.openshift-cluster.mydomain.com]

TASK [openshift_control_plane : restart master] ****************************************************************************************************************
Monday 11 May 2020  03:48:56 -0400 (0:00:00.257)       0:02:25.444 ************ 
changed: [master.openshift-cluster.mydomain.com] => (item=api)
changed: [master.openshift-cluster.mydomain.com] => (item=controllers)

RUNNING HANDLER [openshift_control_plane : verify API server] **************************************************************************************************
Monday 11 May 2020  03:48:57 -0400 (0:00:00.945)       0:02:26.389 ************ 
FAILED - RETRYING: verify API server (120 retries left).
FAILED - RETRYING: verify API server (119 retries left).
[...]
FAILED - RETRYING: verify API server (1 retries left).
fatal: [master.openshift-cluster.mydomain.com]: FAILED! => {"attempts": 120, "changed": false, "cmd": ["curl", "--silent", "--tlsv1.2", "--max-time", "2", "--cacert", "/etc/origin/master/ca-bundle.crt", "https://lb.openshift-cluster.mydomain.com:8443/healthz/ready"], "delta": "0:00:00.182367", "end": "2020-05-11 03:51:52.245644", "msg": "non-zero return code", "rc": 35, "start": "2020-05-11 03:51:52.063277", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}



root@master> systemctl status origin-node.service -l

[...]


May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: E0511 04:23:28.077964  109972 bootstrap.go:195] Part of the existing bootstrap client certificate is expired: 2020-02-20 13:14:27 +0000 UTC
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: I0511 04:23:28.078001  109972 bootstrap.go:56] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: I0511 04:23:28.080555  109972 certificate_store.go:131] Loading cert/key pair from "/etc/origin/node/certificates/kubelet-client-current.pem".
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: F0511 04:23:28.130968  109972 server.go:262] failed to run Kubelet: cannot create certificate signing request: Post https://lb.openshift-cluster.mydomain.com:8443/apis/certificates.k8s.io/v1beta1/certificatesigningrequests: EOF

[...]

Answer 1

openshift_certificate_expiry角色使用openshift_certificate_expiry_fail_on_warn变量来确定当剩余天数少于openshift_certificate_expiry_warning_days时剧本是否应该失败。

因此，请尝试运行redeploy-certificates.yml并将此附加变量设置为“ False”：

ansible-playbook -i /etc/ansible/hosts -e openshift_certificate_expiry_fail_on_warn=False playbooks/redeploy-certificates.yml

Answer 2

我在客户环境中也有同样的情况，这个错误是因为认证已过期，我“欺骗”在过期日期之前更改了 da S.O 日期。源节点服务在我的主人中启动：

 systemctl status origin-node
● origin-node.service - OpenShift Node
   Loaded: loaded (/etc/systemd/system/origin-node.service; enabled; vendor preset: disabled)
   Active: active (running) since Sáb 2021-02-20 20:22:21 -02; 6min ago
     Docs: https://github.com/openshift/origin
 Main PID: 37230 (hyperkube)
   Memory: 79.0M
   CGroup: /system.slice/origin-node.service
           └─37230 /usr/bin/hyperkube kubelet --v=2 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-webhook=true --authentication-token-webhook-cache-ttl=5m --authorization-mode=Webhook --authorization-webhook-c...
Você tem mensagem de correio em /var/spool/mail/okd

如何手动重新创建OpenShift 3.11主服务器的引导客户端证书？

2 个答案: