我正在尝试使用acs-engine在Azure中设置群集,以使用VMSS作为代理池来构建Kubernetes群集。群集启动后,我添加了cluster-autoscaler以管理2个专用代理程序池,1个cpu和1个gpu。只要规模集中仍具有运行中的VM,就可以进行规模缩小和规模扩大。两种比例尺都设置为缩小到0。使用ACS,我已经用污点和自定义标签设置了这两种比例尺设置。一旦缩放比例集缩小到0,计划新的Pod时,我将无法使自动缩放器旋转回一个节点。我不确定自己做错了什么,或者不确定是否缺少一些配置,标签,异味等。我刚刚开始使用kubernetes。
下面是我的acs-engine json,pod定义以及自动定标器和pod的日志描述。
kubectl logs -n kube-system cluster-autoscaler-5967b96496-jnvjr
的输出
I0920 16:11:14.925761 1 scale_up.go:249] Pod default/my-test-pod is unschedulable
I0920 16:11:14.999323 1 utils.go:196] Pod my-test-pod can't be scheduled on k8s-pool2-24760778-vmss, predicate failed: GeneralPredicates predicate mismatch, cannot put default/my-test-pod on template-node-for-k8s-pool2-24760778-vmss-6220731686255962863, reason: node(s) didn't match node selector
I0920 16:11:14.999408 1 utils.go:196] Pod my-test-pod can't be scheduled on k8s-pool3-24760778-vmss, predicate failed: GeneralPredicates predicate mismatch, cannot put default/my-test-pod on template-node-for-k8s-pool3-24760778-vmss-3043543739698957784, reason: node(s) didn't match node selector
I0920 16:11:14.999442 1 scale_up.go:376] No expansion options
kubectl describe pod my-test-pod
的输出
Name: my-test-pod
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: <none>
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"my-test-pod","namespace":"default"},"spec":{"affinity":{"nodeAffinity":{"preferred...
Status: Pending
IP:
Containers:
my-test-pod:
Image: ubuntu:latest
Port: <none>
Host Port: <none>
Command:
/bin/bash
-ec
while :; do echo '.'; sleep 5; done
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qzm6s (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
default-token-qzm6s:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-qzm6s
Optional: false
QoS Class: BestEffort
Node-Selectors: agentpool=pool2
environment=DEV
hardware=cpu-spec
node-template=k8s-pool2-24760778-vmss
vmSize=Standard_D4s_v3
Tolerations: dedicated=pool2:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m (x273 over 17m) default-scheduler 0/3 nodes are available: 3 node(s) didn't match node selector.
Normal NotTriggerScaleUp 2m (x89 over 17m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added)
acs引擎配置文件(使用Terraform渲染和生成)
{
"apiVersion": "vlabs",
"properties": {
"orchestratorProfile": {
"orchestratorType": "Kubernetes",
"orchestratorRelease": "1.11",
"kubernetesConfig": {
"networkPlugin": "azure",
"clusterSubnet": "${cidr}",
"privateCluster": {
"enabled": true
},
"addons": [
{
"name": "nvidia-device-plugin",
"enabled": true
},
{
"name": "cluster-autoscaler",
"enabled": true,
"config": {
"minNodes": "0",
"maxNodes": "2",
"image": "gcr.io/google-containers/cluster-autoscaler:1.3.1"
}
}
]
}
},
"masterProfile": {
"count": ${master_vm_count},
"dnsPrefix": "${dns_prefix}",
"vmSize": "${master_vm_size}",
"storageProfile": "ManagedDisks",
"vnetSubnetId": "${pool_subnet_id}",
"firstConsecutiveStaticIP": "${first_master_ip}",
"vnetCidr": "${cidr}"
},
"agentPoolProfiles": [
{
"name": "pool3",
"count": ${dedicated_vm_count},
"vmSize": "${dedicated_vm_size}",
"storageProfile": "ManagedDisks",
"OSDiskSizeGB": 31,
"vnetSubnetId": "${pool_subnet_id}",
"customNodeLabels": {
"vmSize":"${dedicated_vm_size}",
"dedicatedOnly": "true",
"environment":"${environment}",
"hardware": "${dedicated_spec}"
},
"availabilityProfile": "VirtualMachineScaleSets",
"scaleSetEvictionPolicy": "Delete",
"kubernetesConfig": {
"kubeletConfig": {
"--register-with-taints": "dedicated=pool3:NoSchedule"
}
}
},
{
"name": "pool2",
"count": ${pool2_vm_count},
"vmSize": "${pool2_vm_size}",
"storageProfile": "ManagedDisks",
"OSDiskSizeGB": 31,
"vnetSubnetId": "${pool_subnet_id}",
"availabilityProfile": "VirtualMachineScaleSets",
"customNodeLabels": {
"vmSize":"${pool2_vm_size}",
"environment":"${environment}",
"hardware": "${pool_spec}"
},
"kubernetesConfig": {
"kubeletConfig": {
"--register-with-taints": "dedicated=pool2:NoSchedule"
}
}
},
{
"name": "pool1",
"count": ${pool1_vm_count},
"vmSize": "${pool1_vm_size}",
"storageProfile": "ManagedDisks",
"OSDiskSizeGB": 31,
"vnetSubnetId": "${pool_subnet_id}",
"availabilityProfile": "VirtualMachineScaleSets",
"customNodeLabels": {
"vmSize":"${pool1_vm_size}",
"environment":"${environment}",
"hardware": "${pool_spec}"
}
}
],
"linuxProfile": {
"adminUsername": "${admin_user}",
"ssh": {
"publicKeys": [
{
"keyData": "${ssh_key}"
}
]
}
},
"servicePrincipalProfile": {
"clientId": "${service_principal_client_id}",
"secret": "${service_principal_client_secret}"
}
}
}
Pod配置文件
apiVersion: v1
kind: Pod
metadata:
name: my-test-pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: vmSize
operator: In
values:
- Standard_D4s_v3
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: hardware
operator: In
values:
- cpu-spec
nodeSelector:
agentpool: pool2
hardware: cpu-spec
vmSize: Standard_D4s_v3
environment: DEV
node-template: k8s-pool2-24760778-vmss
tolerations:
- key: dedicated
operator: Equal
value: pool2
effect: NoSchedule
containers:
- name: my-test-pod
image: ubuntu:latest
command: ["/bin/bash", "-ec", "while :; do echo '.'; sleep 5; done"]
restartPolicy: Never
我尝试了在nodeAffinity / nodeSelector / Tolerations中添加和删除它们的变体,所有这些都具有相同的结果。
集群启动后,我确实将pool2添加到自动缩放器中。在Internet上寻找解决方案时,我会不断浏览有关节点模板标签的文章,我认为形式是k8s.io/autoscaler/cluster-autoscaler/node-template/label/value,但这似乎是必需的适用于AWS。
有人可以在Azure上为我提供任何指导吗?
谢谢。
答案 0 :(得分:0)
更新。
我已经找到答案。通过删除 requiredDuringSchedulingIgnoreDuringExecution 节点关联性规则,并仅使用 preferredDuringSchedulingIgnoreDuringExecution ,调度程序就可以在规模集中正确启动新的VM。
apiVersion: v1
kind: Pod
metadata:
name: my-test-pod
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: hardware
operator: In
values:
- cpu-spec
nodeSelector:
agentpool: pool2
hardware: cpu-spec
vmSize: Standard_D4s_v3
environment: DEV
node-template: k8s-pool2-24760778-vmss
tolerations:
- key: dedicated
operator: Equal
value: pool2
effect: NoSchedule
containers:
- name: my-test-pod
image: ubuntu:latest
command: ["/bin/bash", "-ec", "while :; do echo '.'; sleep 5; done"]
restartPolicy: Never