运行时网络未就绪:NetworkReady = false原因:NetworkPluginNotReady消息:docker:网络插件未就绪:cni config未初始化

时间:2019-12-27 05:15:24

标签: google-kubernetes-engine rollback

您遇到的问题:

"runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized"

您期望发生的事情:

  • 升级应该可以
  • 回滚应该起作用
  • 将其调整为2,所有服务都应显示

复制步骤:

Running GKE
Master version 1.14.8-gke.12
Node version: 1.14.8-gke.2
Machine type n1-standard-8

在此升级问题之前可以完美运行,然后:

1)      gcloud beta container node-pools update k-cpu-pool-v1 --cluster=k --workload-metadata-from-node=GKE_METADATA_SERVER --zone=us-central1-a # fails with 2nd node gcloud beta container node-pools rollback k-cpu-pool-v1 --cluster=k3 --zone=us-central1-a # also fails with 2nd node and many deployment won't come up    2)

trying to "Enable metadata server" per instruction
https://medium.com/@louisvernon/mapping-kubernetes-service-accounts-to-gcp-iams-using-workload-identity-b53496d543e0 
but blocked by failure of previous deployment

其他信息(您尝试过的解决方法,参考的文档等):

I tried looking at google forum issue but nothing.  Looks like a GKE issue with 
rollback when upgrade fails. double issue. Upgrade and master and node to have
same version? 

It doesn't seem to be this issue because one node came up but second does not in GKE.. (https://stackoverflow.com/questions/52675934/network-plugin-is-not-ready-cni-config-uninitialized)

1 个答案:

答案 0 :(得分:0)

我试图重现您的问题:

  1. 创建集群和池:

    gcloud container clusters create test-cluster --zone us-central1-a --cluster-version 1.14.8-gke.12 --node-version 1.14.8-gke.2 --num-nodes=2
    
    WARNING: Currently VPC-native is not the default mode during cluster creation. In the future, this will become the default mode and can be disabled using `--no-enable-ip-alias` flag. Use `--[no-]enable-ip-alias` flag to suppress this warning.
    WARNING: Newly created clusters and node-pools will have node auto-upgrade enabled by default. This can be disabled using the `--no-enable-autoupgrade` flag.
    WARNING: Starting in 1.12, default node pools in new clusters will have their legacy Compute Engine instance metadata endpoints disabled by default. To create a cluster with legacy instance metadata endpoints disabled in the default node pool, run `clusters create` with the flag `--metadata disable-legacy-endpoints=true`.
    WARNING: Your Pod address range (`--cluster-ipv4-cidr`) can accommodate at most 1008 node(s). 
    This will enable the autorepair feature for nodes. Please see https://cloud.google.com/kubernetes-engine/docs/node-auto-repair for more information on node autorepairs.
    Creating cluster test-cluster in us-central1-a... Cluster is being health-checked (master is healthy)...done.              
    Created [https://container.googleapis.com/v1/projects/test-prj/zones/us-central1-a/clusters/test-cluster].
    To inspect the contents of your cluster, go to: https://console.cloud.google.com/kubernetes/workload_/gcloud/us-central1-a/test-cluster?project=test-prj
    
    NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
    
    test-cluster us-central1-a 1.14.8-gke.12 XX.XX.75.247 n1-standard-1 1.14.8-gke.2 2 RUNNING
    
  2. 通过UI启用工作负载身份(测试版)

Workload Identity Enabled

  1. 最多可扩展到3个节点

    gcloud container clusters resize test-cluster --node-pool default-pool --num-nodes=3 --zone=us-central1-a
    
    Pool [default-pool] for [test-cluster] will be resized to 3.
    Do you want to continue (Y/n)?  y
    Resizing test-cluster...done.                                                                                              
    Updated [https://container.googleapis.com/v1/projects/test-prj/zones/us-central1-a/clusters/test-cluster].
    
  2. 升级节点

    gcloud beta container node-pools update default-pool --cluster=test-cluster --workload-metadata-from-node=GKE_METADATA_SERVER --zone=us-central1-a
    
    Updating node pool default-pool... Done with 3 out of 3 nodes (100.0%): 3 succeeded...done.                                       
    Updated [https://container.googleapis.com/v1beta1/projects/test-prj/zones/us-central1-a/clusters/test-cluster/nodePools/default-pool].
    
  3. 缩小到2个节点

    cloud container clusters resize test-cluster --node-pool default-pool --num-nodes=2 --zone=us-central1-a
    
    Pool [default-pool] for [test-cluster] will be resized to 2.
    Do you want to continue (Y/n)?  y
    Resizing test-cluster...done.                                                                                              
    Updated [https://container.googleapis.com/v1/projects/test-prj/zones/us-central1-a/clusters/test-cluster].
    
  4. 禁用工作负载身份(测试版) 6.1。首先,您应该转到Kubernetes clusters,单击群集->在Clusters,转到Node pools,然后依次单击default-poolEdit node pool-> {{1 }}->转到Edit default-pool,然后取消选中Security。 6.2。然后转到Enable GKE Metadata Server (beta),单击您的集群->在Kubernetes clusters,单击Clusters,然后将Edit设置为Workload Identity (beta)

我已经在测试集群上检查了所有这些命令,没有发现任何错误或网络问题。之后,我尝试重复步骤2-5,然后回滚:

Disabled

没有错误,也没有网络问题。然后,我可以按照第6步中所述通过UI禁用工作负载身份(测试版)。

看起来一切正常,并且您的配置中存在一些特定问题。