无法使用Google terraform GKE模块在GKE群集上创建Windows Nodepool

时间:2020-08-17 21:25:55

标签: kubernetes terraform google-kubernetes-engine

我正在尝试使用Google模块使用Windows node_pool来配置GKE群集,我正在调用模块

  source  = "terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster-update-variant"
  version = "9.2.0"

我必须为GKE所需的linux池和我们所需的Windows池定义两个池,terraform总是能够成功配置linux node_pool,但是无法配置窗口一和错误消息

module.gke.google_container_cluster.primary: Still modifying... [id=projects/uk-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev, 24m31s elapsed]
module.gke.google_container_cluster.primary: Still modifying... [id=projects/uk-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev, 24m41s elapsed]
module.gke.google_container_cluster.primary: Still modifying... [id=projects/uk-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev, 24m51s elapsed]
module.gke.google_container_cluster.primary: Modifications complete after 24m58s [id=projects/xx-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev]
module.gke.google_container_node_pool.pools["windows-node-pool"]: Creating...

Error: error creating NodePool: googleapi: Error 400: Workload Identity is not supported on Windows nodes. Create the nodepool without workload identity by specifying --workload-metadata=GCE_METADATA., badRequest

  on .terraform\modules\gke\terraform-google-kubernetes-engine-9.2.0\modules\beta-private-cluster-update-variant\cluster.tf line 341, in resource "google_container_node_pool" "pools":
 341: resource "google_container_node_pool" "pools" {

我尝试了很多设置该元数据值的地方,但我觉得不正确:

从地形侧面看:

我尝试了很多地方将此元数据添加到模块本身或main.tf文件中的node_config范围内,在其中调用模块,我试图将其添加到node_pools列表的Windows node_pool范围中,但是没有接受一条消息,提示此处不需要设置WORKLOAD IDENTITY

我也尝试设置enable_shielded_nodes = false,但这并没有太大帮助。

即使通过命令行(这是我的命令行),我也试图测试它是否可行

C:\>gcloud container node-pools --region europe-west2 list
NAME                    MACHINE_TYPE   DISK_SIZE_GB  NODE_VERSION
default-node-pool-d916  n1-standard-2  100           1.17.9-gke.600

 
C:\>gcloud container node-pools --region europe-west2 create window-node-pool --cluster=gke-nonpci-dev --image-type=WINDOWS_SAC --no-enable-autoupgrade --machine-type=n1-standard-2
WARNING: Starting in 1.12, new node pools will be created with their legacy Compute Engine instance metadata APIs disabled by default. To create a node pool with legacy instance metadata endpoints disabled, run `node-pools create` with the flag `--metadata disable-legacy-endpoints=true`.
This will disable the autorepair feature for nodes. Please see https://cloud.google.com/kubernetes-engine/docs/node-auto-repair for more information on node autorepairs.
ERROR: (gcloud.container.node-pools.create) ResponseError: code=400, message=Workload Identity is not supported on Windows nodes. Create the nodepool without workload identity by specifying --workload-metadata=GCE_METADATA.

C:\>gcloud container node-pools --region europe-west2 create window-node-pool --cluster=gke-nonpci-dev --image-type=WINDOWS_SAC --no-enable-autoupgrade --machine-type=n1-standard-2 --workload-metadata=GCE_METADATA --metadata disable-legacy-endpoints=true
This will disable the autorepair feature for nodes. Please see https://cloud.google.com/kubernetes-engine/docs/node-auto-repair for more information on node autorepairs.
ERROR: (gcloud.container.node-pools.create) ResponseError: code=400, message=Service account "874988475980-compute@developer.gserviceaccount.com" does not exist.

C:\>gcloud auth list
                       Credentialed Accounts
ACTIVE  ACCOUNT
*       tf-xxx-xxx-xx-xxx@xx-xxx-xx-xxx-xxxx.iam.gserviceaccount.com

这个运行 gcloud auth list的服务帐户是我用来运行terraform的那个帐户,但是我不知道错误消息的来源是哪一个,即使尝试通过命令行创建Windows nodepool也是如此。上面显示的内容也无效,我有点卡住了,也不知道该怎么办。

由于模块9.2.0对于我们之前设置的所有基于Linux的群集来说是一个稳定的模块,因此我认为这可能是Windows node_pool的旧版本,因此我使用11.0.0来查看是否会有所不同,但最终会导致错误

module.gke.google_container_node_pool.pools["default-node-pool"]: Refreshing state... [id=projects/uk-tix-p1-npe-b821/locations/europe-west2/clusters/gke-nonpci-dev/nodePools/default-node-pool-d916]

Error: failed to execute ".terraform/modules/gke.gcloud_delete_default_kube_dns_configmap/terraform-google-gcloud-1.4.1/scripts/check_env.sh": fork/exec .terraform/modules/gke.gcloud_delete_default_kube_dns_configmap/terraform-google-gcloud-1.4.1/scripts/check_env.sh: %1 is not a valid Win32 application.

  on .terraform\modules\gke.gcloud_delete_default_kube_dns_configmap\terraform-google-gcloud-1.4.1\main.tf line 70, in data "external" "env_override":
  70: data "external" "env_override" {

Error: failed to execute ".terraform/modules/gke.gcloud_wait_for_cluster/terraform-google-gcloud-1.3.0/scripts/check_env.sh": fork/exec .terraform/modules/gke.gcloud_wait_for_cluster/terraform-google-gcloud-1.3.0/scripts/check_env.sh: %1 is not a valid Win32 application.

  on .terraform\modules\gke.gcloud_wait_for_cluster\terraform-google-gcloud-1.3.0\main.tf line 70, in data "external" "env_override":
  70: data "external" "env_override" {

这是我设置node_pools参数的方式


  node_pools = [
    {
      name               = "linux-node-pool"
      machine_type       = var.nodepool_instance_type
      min_count          = 1
      max_count          = 10
      disk_size_gb       = 100
      disk_type          = "pd-standard"
      image_type         = "COS"                                  
      auto_repair        = true                                   
      auto_upgrade       = true                                 
      service_account    = google_service_account.gke_cluster_sa.email
      preemptible        = var.preemptible
      initial_node_count = 1
    },
    {
      name               = "windows-node-pool"
      machine_type       = var.nodepool_instance_type
      min_count          = 1
      max_count          = 10
      disk_size_gb       = 100
      disk_type          = "pd-standard"
      image_type         = var.nodepool_image_type                
      auto_repair        = true                                   
      auto_upgrade       = true                                   
      service_account    = google_service_account.gke_cluster_sa.email
      preemptible        = var.preemptible
      initial_node_count = 1
  
    }
  ]

  cluster_resource_labels = var.cluster_resource_labels           

  # health check and webhook firewall rules
  node_pools_tags = {
    all = [
      "xx-xxx-xxx-local-xxx",
    ]
  }

  node_pools_metadata = {
    all = {
//      workload-metadata = "GCE_METADATA"
    }

    linux-node-pool = {
      ssh-keys = join("\n", [for user, key in var.node_ssh_keys : "${user}:${key}"])
      block-project-ssh-keys = true
    }

    windows-node-pool = {
      workload-metadata = "GCE_METADATA"
    }

  }

  • 这是共享的VPC,我在其中为群集配置群集版本:1.17.9-gke.600

1 个答案:

答案 0 :(得分:1)

检出https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/issues/632以获取解决方案。

错误消息不明确,并且GKE有一个内部错误来跟踪此问题。我们将尽快改善错误消息。