Question

我有两个用例，其中团队仅希望Pod A终止在Pod B正在运行的节点上。他们通常在节点上运行Pod B的许多副本，但他们只希望Pod A在同一节点上运行的一个副本。

当前，他们正在使用守护程序来管理Pod A，此操作无效，因为Pod A最终出现在Pod B未运行的许多节点上。我不希望不限制它们最终可以带有标签的节点，因为那样会限制Pod B的节点容量（即，如果我们有100个节点，而有20个节点被标记，那么Pod B的可能容量仅为20）。

简而言之，如何确保Pod A的一个副本在至少运行Pod B的任何一个节点上运行？

Answer 1

当前的调度程序实际上没有这样的功能。您需要自己写点东西。

Answer 2

正如coderanger-curched调度程序已经解释的那样，不支持此功能。理想的解决方案是创建自己的调度程序以支持此类功能。

不过，您可以使用podAffinity在同一节点上部分安排Pod。

spec:
  affinity:
    podAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - <your_value>
          topologyKey: "kubernetes.io/hostname"

它将尝试尽可能紧密地安排广告连播。

Answer 3

据我了解，您拥有带有 N 个节点的Kubernetes集群，并且计划在其中部署一些类型为 B 的Pod。现在，您只希望在计划了多于零个类型为 B 的Pod的节点上存在类型为 A 的Pod的一个实例。我假设 A <= N 和 A <= B 和（ B> N 或 B <= N ）（将<=读为更大或相等）。

此刻，您正在使用Daemonset controller来安排 podsA ，但这并不能按照您的要求进行。但是您可以通过强制Deaemonset而不是DaemonSet controller来安排ScheduleDaemonSetPods而不是NodeAffinity来计划其广告连播，而无需考虑广告连播优先级和抢占 >。

.spec.nodeName通过将spec.template项而不是node.kubernetes.io/unschedulable:NoSchedule项添加到DaemonSet窗格中，允许您使用默认调度程序而不是DaemonSet控制器来调度DaemonSet。然后，使用默认调度程序将Pod绑定到目标主机。如果DaemonSet容器的节点亲和力已存在，则将其替换。 DaemonSet控制器仅在创建或修改DaemonSet吊舱时执行这些操作，并且不会对DaemonSet的unschedulable进行任何更改。
此外，podAaffinity容差自动添加到DaemonSet Pods。调度DaemonSet Pod时，默认调度程序会忽略podAntiAffinity个节点。

因此，如果我们将number of nodes / Pending添加到Daemonset，将创建 N = apiVersion: apps/v1 kind: DaemonSet metadata: name: ds-splunk-sidecar namespace: default labels: k8s-app: ds-splunk-sidecar spec: selector: matchLabels: name: ds-splunk-sidecar template: metadata: labels: name: ds-splunk-sidecar spec: affinity: # podAntiAffinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - splunk-app topologyKey: "kubernetes.io/hostname" tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule containers: - name: ds-splunk-sidecar image: nginx terminationGracePeriodSeconds: 30个副本，但仅针对符合条件的节点的（反）亲和度将被调度，其余的容器将保持kubectl get pods -o wide | grep splunk状态。

以下是此类守护程序的示例：

ds-splunk-sidecar-26cpt          0/1     Pending     0          4s     <none>         <none>         <none>           <none>
ds-splunk-sidecar-8qvpx          1/1     Running     0          4s     10.244.2.87    kube-node2-2   <none>           <none>
ds-splunk-sidecar-gzn7l          0/1     Pending     0          4s     <none>         <none>         <none>           <none>
ds-splunk-sidecar-ls56g          0/1     Pending     0          4s     <none>         <none>         <none>           <none>
splunk-7d65dfdc99-nz6nz          1/2     Running     0          2d     10.244.2.16    kube-node2-2   <none>           <none>

kubectl get pod ds-splunk-sidecar-26cpt -o yaml的输出：

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2020-04-02T13:10:23Z"
  generateName: ds-splunk-sidecar-
  labels:
    controller-revision-hash: 77bfdfc748
    name: ds-splunk-sidecar
    pod-template-generation: "1"
  name: ds-splunk-sidecar-26cpt
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: ds-splunk-sidecar
    uid: 4fda6743-74e3-11ea-8141-42010a9c0004
  resourceVersion: "60026611"
  selfLink: /api/v1/namespaces/default/pods/ds-splunk-sidecar-26cpt
  uid: 4fdf96d5-74e3-11ea-8141-42010a9c0004
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - kube-node2-1
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - splunk-app
        topologyKey: kubernetes.io/hostname
  containers:
  - image: nginx
    imagePullPolicy: Always
    name: ds-splunk-sidecar
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-mxvh9
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30  
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/pid-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/unschedulable
    operator: Exists
  volumes:
  - name: default-token-mxvh9
    secret:
      defaultMode: 420
      secretName: default-token-mxvh9
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-04-02T13:10:23Z"
    message: '0/4 nodes are available: 1 node(s) didn''t match pod affinity rules,
      1 node(s) didn''t match pod affinity/anti-affinity, 3 node(s) didn''t match
      node selector.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: BestEffort

apiVersion: apps/v1 kind: Deployment metadata: name: deplA spec: selector: matchLabels: app: deplA replicas: N #<---- Number of nodes in the cluster <= replicas of deplB template: metadata: labels: app: deplA spec: affinity: podAntiAffinity: # Prevent scheduling more tnan one PodA on the same node requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - deplA topologyKey: "kubernetes.io/hostname" podAffinity: # ensures that PodA is schedules only if PodB is present on the same node. requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - deplB topologyKey: "kubernetes.io/hostname" containers: - name: web-app image: nginx:1.16-alpine（处于Pending状态）的输出。 nodeAffinity 部分会自动添加到 pod.spec ，而不会影响父Daemonset配置：

kubectl

或者，您也可以使用 Deployment 控制器获得类似的结果：

一旦我们只能根据Pod指标自动扩展部署（除非您编写自己的HPA），我们就必须将 A 副本的数量设置为等于 N 手动。如果有一个没有Pod B 的节点，则一个 A 的Pod将保持待处理状态。

在问题中使用伪指令by default scheduler描述了一个几乎精确的设置示例。请参阅“ 将节点分配到节点<<的更多实际用例：始终位于同一节点中 ^{requiredDuringSchedulingIgnoredDuringExecution}”部分/ em>”文档页面：

DateTime

只有一个问题，两种情况都相同，如果由于某种原因在不同的节点上重新调度PodB，并且该节点上不再有PodB，则PodA不会自动从该节点退出。

可以通过安排一个string图像并指定适当的服务帐户来解决CronJob问题，每隔约5分钟杀死所有 PodsA ，而没有相应的 PodB strong>存在于同一节点上。（请在Stack上搜索现有的解决方案，或询问有关脚本内容的其他问题）

Kubernetes：如何确保Pod A仅在Pod B正在运行的节点上结束

3 个答案: