我知道Terrath谷歌提供商在Github上也存在类似问题,涉及google_container_cluster的幂等性;但是,似乎没有一个与我的简单示例相符。任何应用Terraform计划的尝试都希望销毁并重新创建我的集群,这需要6分钟以上的时间。
集群没有明显的变化,但是地形状态表明集群的ID是集群的名称,但是新的ID是经过计算的;因此,必须重新创建集群。我可以防止这种情况吗?
我遵循建议的设置集群的示例:使用remove_initial_node_pool=true
和initial_node_count=1
定义集群,然后创建一个显式节点池作为从属资源。我也尝试过使用初始节点池创建默认集群。我没有指定其他与幂等性问题相关联的其他属性(例如master_ipv4_cidr_block)。
这是基本的Terraform设置。我正在使用Terraform v0.11.13和provider.google v2.6.0。
provider "google" {
project = "${var.google_project}"
region = "${var.google_region}"
zone = "${var.google_zone}"
}
resource "google_container_cluster" "cluster" {
project = "${var.google_project}"
name = "${var.cluster_name}"
location = "${var.google_region}"
remove_default_node_pool = true
initial_node_count = 1
master_auth {
username = ""
password = ""
}
timeouts {
create = "20m"
update = "15m"
delete = "15m"
}
}
resource "google_container_node_pool" "cluster_nodes" {
name = "${var.cluster_name}-node-pool"
cluster = "${google_container_cluster.cluster.name}"
node_count = "${var.cluster_node_count}"
node_config {
preemptible = "${var.preemptible}"
disk_size_gb = "${var.disk_size_gb}"
disk_type = "${var.disk_type}"
machine_type = "${var.machine_type}"
oauth_scopes = [
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/trace.append",
"https://www.googleapis.com/auth/compute",
"https://www.googleapis.com/auth/cloud-platform",
]
}
timeouts {
create = "20m"
update = "15m"
delete = "15m"
}
}
output "cluster_ca_certificate" {
value = "${google_container_cluster.cluster.master_auth.0.cluster_ca_certificate}"
}
output "host" {
value = "${google_container_cluster.cluster.endpoint}"
}
provider "kubernetes" {
host = "${google_container_cluster.cluster.endpoint}"
client_certificate = "${base64decode(google_container_cluster.cluster.master_auth.0.client_certificate)}"
client_key = "${base64decode(google_container_cluster.cluster.master_auth.0.client_key)}"
cluster_ca_certificate = "${base64decode(google_container_cluster.cluster.master_auth.0.cluster_ca_certificate)}"
}
以此类推。未显示用于启用Helm服务帐户的服务帐户和群集角色绑定以及Helm版本。我认为这里无关紧要。
如果我两次执行terraform apply
,则第二次调用要销毁并创建一个新集群。什么都没有改变,所以不应该发生。
正常情况下,这是可以的,除了我倾向于从terraform提供商那里看到很多超时,并且不得不重新应用之外,这无济于事,因为重新应用会导致群集被破坏并重新创建。
terraform apply
的输出如下:
terraform-gke$ terraform apply
data.template_file.gke_values: Refreshing state...
google_container_cluster.cluster: Refreshing state... (ID: test-eric)
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
-/+ destroy and then create replacement
Terraform will perform the following actions:
-/+ google_container_cluster.cluster (new resource required)
id: "test-eric" => <computed> (forces new resource)
additional_zones.#: "3" => <computed>
addons_config.#: "1" => <computed>
cluster_autoscaling.#: "0" => <computed>
cluster_ipv4_cidr: "10.20.0.0/14" => <computed>
enable_binary_authorization: "" => <computed>
enable_kubernetes_alpha: "false" => "false"
enable_legacy_abac: "false" => "false"
enable_tpu: "" => <computed>
endpoint: "34.66.113.0" => <computed>
initial_node_count: "1" => "1"
instance_group_urls.#: "0" => <computed>
ip_allocation_policy.#: "0" => <computed>
location: "us-central1" => "us-central1"
logging_service: "logging.googleapis.com" => <computed>
master_auth.#: "1" => "1"
master_auth.0.client_certificate: "" => <computed>
master_auth.0.client_certificate_config.#: "1" => "0" (forces new resource)
master_auth.0.client_key: <sensitive> => <computed> (attribute changed)
答案 0 :(得分:0)
似乎您已从基本(用户名/密码)切换为TLS授权,因为根据您的日志,您将生成新证书并强制使用新集群。
答案 1 :(得分:0)
因此,最终,这是一个提供程序错误,但这是由Kubernetes主服务的行为所引起的,该行为在1.11.x和1.12.x版本之间发生了变化,Google最近才推出了该行为,将其作为GKE节点的默认设置。它已在Terraform Google提供程序的Github问题中以#3369的形式捕获。
解决方法是告诉Terraform忽略master_auth
和network
中的更改:
resource google_container_cluster cluster {
master_auth {
username = ""
password = ""
}
# Workaround for issue 3369 (until provider version 3.0.0?)
# This is necessary when using GKE node version 1.12.x or later.
# It is possible to make GKE use node version 1.11.x as an
# alternative workaround.
lifecycle {
ignore_changes = [ "master_auth", "network" ]
}
}
nb 也许是为了帮助其他遇到相同问题的人...很难搜索诸如Web和Github之类的地方来找到此类问题的相关答案,因为作者使用许多不同的术语来描述Terraform表现出的行为。有时也将此问题描述为Terraform幂等性和Terraform更改的问题。