伙计们, 尝试将GKE群集从1个节点增加到3个节点时,它们在单独的区域(us-centra1-a,b,c)中运行。以下内容显而易见:
安排在新节点上的Pod无法访问Internet上的资源...即无法连接到条带化api等(可能与kube-dns相关,尚未测试试图在没有DNS查找的情况下离开的流量)。 / p>
类似地,无法按预期在K8中的Pod之间路由。即似乎跨az电话可能会失败?使用openvpn进行测试时,无法连接到在新节点上安排的Pod。
我注意到的另一个问题是Metrics服务器似乎很奇怪。 kubectl top nodes
显示新节点未知。
在撰写本文时,主k8s版本1.15.11-gke.9
设置要注意:
VPC-native (alias IP) - disabled
Intranode visibility - disabled
gcloud容器集群描述了cluster-1 --zone us-central1-a
clusterIpv4Cidr: 10.8.0.0/14
createTime: '2017-10-14T23:44:43+00:00'
currentMasterVersion: 1.15.11-gke.9
currentNodeCount: 1
currentNodeVersion: 1.15.11-gke.9
endpoint: 35.192.211.67
initialClusterVersion: 1.7.8
instanceGroupUrls:
- https://www.googleapis.com/compute/v1/projects/skilful-frame-180217/zones/us-central1-a/instanceGroupManagers/gke-cluster-1-default-pool-ff24932a-grp
ipAllocationPolicy: {}
labelFingerprint: a9dc16a7
legacyAbac:
enabled: true
location: us-central1-a
locations:
- us-central1-a
loggingService: none
....
masterAuthorizedNetworksConfig: {}
monitoringService: none
name: cluster-1
network: default
networkConfig:
network: .../global/networks/default
subnetwork: .../regions/us-central1/subnetworks/default
networkPolicy:
provider: CALICO
nodeConfig:
diskSizeGb: 100
diskType: pd-standard
imageType: COS
machineType: n1-standard-2
...
nodeIpv4CidrSize: 24
nodePools:
- autoscaling: {}
config:
diskSizeGb: 100
diskType: pd-standard
imageType: COS
machineType: n1-standard-2
...
initialNodeCount: 1
locations:
- us-central1-a
management:
autoRepair: true
autoUpgrade: true
name: default-pool
podIpv4CidrSize: 24
status: RUNNING
version: 1.15.11-gke.9
servicesIpv4Cidr: 10.11.240.0/20
status: RUNNING
subnetwork: default
zone: us-central1-a
下一步故障排除步骤是创建一个新池并迁移到该池。也许答案就直盯着我...可能是nodeIpv4CidrSize
/ 24?
谢谢!
答案 0 :(得分:2)
name: cluster-1
network: default
networkConfig:
network: .../global/networks/default
subnetwork: .../regions/us-central1/subnetworks/default
networkPolicy:
provider: CALICO
gcloud beta container --project "PROJECT_NAME" clusters create "cluster-1" \
--zone "us-central1-a" \
--no-enable-basic-auth \
--cluster-version "1.15.11-gke.9" \
--machine-type "n1-standard-1" \
--image-type "COS" \
--disk-type "pd-standard" \
--disk-size "100" \
--metadata disable-legacy-endpoints=true \
--scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \
--num-nodes "1" \
--no-enable-ip-alias \
--network "projects/owilliam/global/networks/default" \
--subnetwork "projects/owilliam/regions/us-central1/subnetworks/default" \
--enable-network-policy \
--no-enable-master-authorized-networks \
--addons HorizontalPodAutoscaling,HttpLoadBalancing \
--enable-autoupgrade \
--enable-autorepair
addonsConfig:
networkPolicyConfig: {}
...
name: cluster-1
network: default
networkConfig:
network: projects/owilliam/global/networks/default
subnetwork: projects/owilliam/regions/us-central1/subnetworks/default
networkPolicy:
enabled: true
provider: CALICO
...
Network Policy Addon is not Enabled
。这很奇怪,因为它已应用但未启用。我DISABLED
在我的集群上,然后看:
addonsConfig:
networkPolicyConfig:
disabled: true
...
name: cluster-1
network: default
networkConfig:
network: projects/owilliam/global/networks/default
subnetwork: projects/owilliam/regions/us-central1/subnetworks/default
nodeConfig:
...
NetworkPolicyConfig
从{}
到disabled: true
,NetworkPolicy
上方的nodeConfig
部分现在消失了。因此,我建议您再次启用和禁用它,以查看它是否更新了正确的资源并解决您的网络政策问题,这是我们将要做的事情:
如果您的集群尚未投入生产,建议您将其大小调整为1,进行更改,然后再次扩展,更新将更快。但是如果它已投入生产,则保持原样,但可能需要更长的时间,具体取决于您的广告连播中断政策。 (default-pool
是群集池的名称),我将在示例中调整其大小:
$ gcloud container clusters resize cluster-1 --node-pool default-pool --num-nodes 1
Do you want to continue (Y/n)? y
Resizing cluster-1...done.
$ gcloud container clusters update cluster-1 --update-addons=NetworkPolicy=ENABLED
Updating cluster-1...done.
$ gcloud container clusters update cluster-1 --enable-network-policy
Do you want to continue (Y/n)? y
Updating cluster-1...done.
$ gcloud container clusters update cluster-1 --no-enable-network-policy
Do you want to continue (Y/n)? y
Updating cluster-1...done.
$ gcloud container clusters update cluster-1 --update-addons=NetworkPolicy=DISABLED
Updating cluster-1...done.
$ gcloud container clusters resize cluster-1 --node-pool default-pool --num-nodes 3
Do you want to continue (Y/n)? y
Resizing cluster-1...done.
以下是此配置的参考: Creating a Cluster Network Policy
如果在此之后仍然遇到问题,请使用最新的群集描述更新您的问题,我们将进行进一步的探讨。