扩展GKE K8s集群破坏网络

时间:2020-04-27 22:42:24

标签: kubernetes google-cloud-platform google-kubernetes-engine

伙计们, 尝试将GKE群集从1个节点增加到3个节点时,它们在单独的区域(us-centra1-a,b,c)中运行。以下内容显而易见:

安排在新节点上的Pod无法访问Internet上的资源...即无法连接到条带化api等(可能与kube-dns相关,尚未测试试图在没有DNS查找的情况下离开的流量)。 / p>

类似地,无法按预期在K8中的Pod之间路由。即似乎跨az电话可能会失败?使用openvpn进行测试时,无法连接到在新节点上安排的Pod。

我注意到的另一个问题是Metrics服务器似乎很奇怪。 kubectl top nodes显示新节点未知。

在撰写本文时,主k8s版本1.15.11-gke.9

设置要注意:

VPC-native (alias IP) - disabled
Intranode visibility - disabled

gcloud容器集群描述了cluster-1 --zone us-central1-a

clusterIpv4Cidr: 10.8.0.0/14
createTime: '2017-10-14T23:44:43+00:00'
currentMasterVersion: 1.15.11-gke.9
currentNodeCount: 1
currentNodeVersion: 1.15.11-gke.9
endpoint: 35.192.211.67
initialClusterVersion: 1.7.8
instanceGroupUrls:
- https://www.googleapis.com/compute/v1/projects/skilful-frame-180217/zones/us-central1-a/instanceGroupManagers/gke-cluster-1-default-pool-ff24932a-grp
ipAllocationPolicy: {}
labelFingerprint: a9dc16a7
legacyAbac:
  enabled: true
location: us-central1-a
locations:
- us-central1-a
loggingService: none

....

masterAuthorizedNetworksConfig: {}
monitoringService: none
name: cluster-1
network: default
networkConfig:
  network: .../global/networks/default
  subnetwork: .../regions/us-central1/subnetworks/default
networkPolicy:
  provider: CALICO
nodeConfig:
  diskSizeGb: 100
  diskType: pd-standard
  imageType: COS
  machineType: n1-standard-2
  ...
nodeIpv4CidrSize: 24
nodePools:
- autoscaling: {}
  config:
    diskSizeGb: 100
    diskType: pd-standard
    imageType: COS
    machineType: n1-standard-2
    ...
  initialNodeCount: 1
  locations:
  - us-central1-a
  management:
    autoRepair: true
    autoUpgrade: true
  name: default-pool
  podIpv4CidrSize: 24
  status: RUNNING
  version: 1.15.11-gke.9
servicesIpv4Cidr: 10.11.240.0/20
status: RUNNING
subnetwork: default
zone: us-central1-a

下一步故障排除步骤是创建一个新池并迁移到该池。也许答案就直盯着我...可能是nodeIpv4CidrSize / 24?

谢谢!

1 个答案:

答案 0 :(得分:2)

  • 在您的问题中,群集的说明具有以下网络策略:
name: cluster-1
network: default
networkConfig:
  network: .../global/networks/default
  subnetwork: .../regions/us-central1/subnetworks/default
networkPolicy:
  provider: CALICO
  • 我尽可能地部署了一个集群:
gcloud beta container --project "PROJECT_NAME" clusters create "cluster-1" \
--zone "us-central1-a" \
--no-enable-basic-auth \
--cluster-version "1.15.11-gke.9" \
--machine-type "n1-standard-1" \
--image-type "COS" \
--disk-type "pd-standard" \
--disk-size "100" \
--metadata disable-legacy-endpoints=true \
--scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \
--num-nodes "1" \
--no-enable-ip-alias \
--network "projects/owilliam/global/networks/default" \
--subnetwork "projects/owilliam/regions/us-central1/subnetworks/default" \
--enable-network-policy \
--no-enable-master-authorized-networks \
--addons HorizontalPodAutoscaling,HttpLoadBalancing \
--enable-autoupgrade \
--enable-autorepair
  • 在获得与您相同的配置之后,我将指出两个部分:
addonsConfig:
  networkPolicyConfig: {}
...
name: cluster-1
network: default
networkConfig:
  network: projects/owilliam/global/networks/default
  subnetwork: projects/owilliam/regions/us-central1/subnetworks/default
networkPolicy:
  enabled: true
  provider: CALICO
...
  • 在评论中您提到“在UI中,它说网络策略已禁用...是否有删除掉印花布的命令?”。然后,我给了您命令,您收到了错误消息,指出Network Policy Addon is not Enabled

这很奇怪,因为它已应用但未启用。我DISABLED在我的集群上,然后看:

addonsConfig:
  networkPolicyConfig:
    disabled: true
...
name: cluster-1
network: default
networkConfig:
  network: projects/owilliam/global/networks/default
  subnetwork: projects/owilliam/regions/us-central1/subnetworks/default
nodeConfig:
...
  • NetworkPolicyConfig{}disabled: trueNetworkPolicy上方的nodeConfig部分现在消失了。因此,我建议您再次启用和禁用它,以查看它是否更新了正确的资源并解决您的网络政策问题,这是我们将要做的事情:

  • 如果您的集群尚未投入生产,建议您将其大小调整为1,进行更改,然后再次扩展,更新将更快。但是如果它已投入生产,则保持原样,但可能需要更长的时间,具体取决于您的广告连播中断政策。 (default-pool是群集池的名称),我将在示例中调整其大小:

$ gcloud container clusters resize cluster-1 --node-pool default-pool --num-nodes 1
Do you want to continue (Y/n)?  y
Resizing cluster-1...done.
  • 然后启用网络策略插件本身(它不会激活它,只会使其可用):
$ gcloud container clusters update cluster-1 --update-addons=NetworkPolicy=ENABLED
Updating cluster-1...done.                                                                                                                                                      
  • 并且我们启用(激活)网络策略:
$ gcloud container clusters update cluster-1 --enable-network-policy
Do you want to continue (Y/n)?  y
Updating cluster-1...done.                                                                                                                                                      
  • 现在让我们撤消它:
$ gcloud container clusters update cluster-1 --no-enable-network-policy
Do you want to continue (Y/n)?  y
Updating cluster-1...done.    
  • 禁用它之后,等待池准备就绪并运行最后一条命令:
$ gcloud container clusters update cluster-1 --update-addons=NetworkPolicy=DISABLED
Updating cluster-1...done.
  • 如果缩小比例,则将其缩放回3:
$ gcloud container clusters resize cluster-1 --node-pool default-pool --num-nodes 3
Do you want to continue (Y/n)?  y
Resizing cluster-1...done.
  • 最后再次检查说明,看它是否与正确的配置相匹配,并测试Pod之间的通讯。

以下是此配置的参考: Creating a Cluster Network Policy

如果在此之后仍然遇到问题,请使用最新的群集描述更新您的问题,我们将进行进一步的探讨。