Question

我目前有多个AWS账户，每个账户都拥有自己的Kubernetes集群。不幸的是，最初使用kops部署集群时，创建了具有重叠CIDR块的VPC。通常，这并不是问题，因为每个星团本质上都存在于自己的宇宙中。

情况发生了一些变化，现在我们要实现跨帐户VPC对等。这个想法是通过VPN连接的用户可以通过所述对等访问所有资源。我的理解是，实现对等时，CIDR块重叠将成为一个主要问题。

似乎不能仅更改现有集群的CIDR块。我唯一的选择是使用ark等在新VPC中备份和还原群集吗？是否有人进行了完整的集群迁移？如果有更好的答案，我会很好奇。

Answer 1

您的理解是正确的：使用kops，您无法更改现有集群的CIDR块；它被卡在创建它的VPC中，并且you can't change the CIDR block of a VPC：

VPC的IP地址范围由关联的CIDR块组成用它。创建VPC时选择一个CIDR块，然后以后可以添加或删除辅助CIDR块。 您的CIDR阻止创建VPC时添加的内容无法更改，但是您可以添加和删除辅助CIDR块以更改IP地址范围 VPC。（强调我的）

这引出第二点：迁移集群。这可以分为两个阶段：

迁移由kops管理的基础架构
迁移集群上的工作负载

1。迁移由kops

管理的基础架构

您将需要迁移（即重新创建）kops集群本身：ec2实例，kops InstanceGroups和Cluster对象，各种AWS基础设施等。为此，您可以使用{{ 1}}命令：

kops toolbox template

这是一个类似于Helm的工具，可用于模板化kops集群配置并传递不同的kops toolbox template --values /path/to/values.yaml --template /path/to/cluster/template.yaml > /path/to/output/cluster.yaml kops create -f /path/to/output/cluster.yaml文件。您可能希望将此命令包含在小型Shell脚本包装器或Makefile中，以创建一键式集群部署，从而轻松，可重复地设置k8s集群基础结构。

示例群集template.yaml文件和values.yaml文件可能如下所示，其中包括values.yaml以及主，工作和自动缩放Cluster的规范。

InstanceGroup

和values.yaml文件：

# template.yaml
{{ $clusterSubdomain := (env "CLUSTER_SUBDOMAIN") }}
{{ $subnetCidr := (env "SUBNET_CIDR") }}

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  name: {{ $clusterSubdomain }}.k8s.example.io
spec:
  hooks:
  - manifest: |
      [Unit]
      Description=Create example user
      ConditionPathExists=!/home/example/.ssh/authorized_keys

      [Service]
      Type=oneshot
      ExecStart=/bin/sh -c 'useradd example && echo "{{ .examplePublicKey }}" > /home/example/.ssh/authorized_keys'
    name: useradd-example.service
    roles:
    - Node
    - Master
  - manifest: |
      Type=oneshot
      ExecStart=/usr/bin/coreos-cloudinit --from-file=/home/core/cloud-config.yaml
    name: reboot-window.service
    roles:
    - Node
    - Master
  kubeAPIServer:
    authorizationRbacSuperUser: admin
    featureGates:
      TaintBasedEvictions: "true"
  kubeControllerManager:
    featureGates:
      TaintBasedEvictions: "true"
    horizontalPodAutoscalerUseRestClients: false
  kubeScheduler:
    featureGates:
      TaintBasedEvictions: "true"
  kubelet:
    featureGates:
      TaintBasedEvictions: "true"
  fileAssets:
  - content: |
      yes
    name: docker-1.12
    path: /etc/coreos/docker-1.12
    roles:
    - Node
    - Master
  - content: |
      #cloud-config
      coreos:
        update:
          reboot-strategy: "etcd-lock"
        locksmith:
          window-start: {{ .locksmith.windowStart }}
          window-length: {{ .locksmith.windowLength }}
    name: cloud-config.yaml
    path: /home/core/cloud-config.yaml
    roles:
    - Node
    - Master
  api:
    dns: {}
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://my-bucket.example.io/{{ $clusterSubdomain }}.k8s.example.io
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-{{ .zone }}
      name: a
    name: main
  - etcdMembers:
    - instanceGroup: master-{{ .zone }}
      name: a
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubernetesApiAccess:
  - {{ .apiAccessCidr }}
  kubernetesVersion: {{ .k8sVersion }}
  masterPublicName: api.{{ $clusterSubdomain }}.k8s.example.io
  networkCIDR: {{ .vpcCidr }}
  networkID: {{ .vpcId }}
  networking:
    canal: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - {{ .sshAccessCidr }}
  subnets:
  - cidr: {{ $subnetCidr }}
    name: {{ .zone }}
    type: Public
    zone: {{ .zone }}
  topology:
    dns:
      type: Public
    masters: public
    nodes: public
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: {{ $clusterSubdomain }}.k8s.example.io
  name: master-{{ .zone }}
spec:
{{- if .additionalSecurityGroups }}
  additionalSecurityGroups:
{{- range .additionalSecurityGroups }}
  - {{ . }}
{{- end }}
{{- end }}
  image: {{ .image }}
  machineType: {{ .awsMachineTypeMaster }}
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-{{ .zone }}
  role: Master
  subnets:
  - {{ .zone }}
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: {{ $clusterSubdomain }}.k8s.example.io
  name: nodes
spec:
{{- if .additionalSecurityGroups }}
  additionalSecurityGroups:
{{- range .additionalSecurityGroups }}
  - {{ . }}
{{- end }}
{{- end }}
  image: {{ .image }}
  machineType: {{ .awsMachineTypeNode }}
  maxSize: {{ .nodeCount }}
  minSize: {{ .nodeCount }}
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  subnets:
  - {{ .zone }}
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  name: ag.{{ $clusterSubdomain }}.k8s.example.io
  labels:
    kops.k8s.io/cluster: {{ $clusterSubdomain }}.k8s.example.io
spec:
{{- if .additionalSecurityGroups }}
  additionalSecurityGroups:
{{- range .additionalSecurityGroups }}
  - {{ . }}
{{- end }}
{{- end }}
  image: {{ .image }}
  machineType: {{ .awsMachineTypeAg }}
  maxSize: 10
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: ag.{{ $clusterSubdomain }}.k8s.example.io
  role: Node
  subnets:
  - {{ .zone }}

2。迁移集群上的工作负载

it does seem to fit your use case well没有对Ark的任何动手经验：

集群迁移

使用备份和还原

Heptio Ark可以帮助您将资源从一个群集移植到另一个，只要您将每个Ark Config指向相同的云对象存储。在这种情况下，我们还假设您的集群是由同一云提供商托管。请注意，Heptio方舟不支持跨云提供商的持久卷迁移。

# values.yaml:

region: us-west-2 
zone: us-west-2a  
environment: staging 
image: ami-abc123
awsMachineTypeNode: c5.large
awsMachineTypeMaster: m5.xlarge
awsMachineTypeAg: c5.large
nodeCount: "2"
k8sVersion: "1.9.3"
vpcId: vpc-abc123
vpcCidr: 172.23.0.0/16
apiAccessCidr: <e.g. office ip> 
sshAccessCidr: <e.g. office ip>
additionalSecurityGroups:
- sg-def234 # kubernetes-standard
- sg-abc123 # example scan engine targets
examplePublicKey: "ssh-rsa ..."
locksmith:
  windowStart: Mon 16:00 # 8am Monday PST
  windowLength: 4h

整个群集（根据需要替换）：

(Cluster 1) Assuming you haven’t already been checkpointing your data with the Ark schedule operation, you need to first back up your

集群1，因此您的新Ark服务器实例指向同一桶。

ark backup create <BACKUP-NAME>

The default TTL is 30 days (720 hours); you can use the --ttl flag to change this as necessary.

(Cluster 2) Make sure that the persistentVolumeProvider and backupStorageProvider fields in the Ark Config match the ones from

存储。

(Cluster 2) Make sure that the Ark Backup object has been created. Ark resources are synced with the backup files available in cloud

在AWS集群上配置Ark似乎很简单：https://github.com/heptio/ark/blob/master/docs/aws-config.md。

使用kops工具箱脚本和Ark配置进行一些初始设置后，您应该有一种干净，可重复的方式来迁移群集，并按照模因原理将宠物变成牛。

kubernetes集群迁移

1 个答案: