Question

我们有一个5节点群集，该群集已移到公司防火墙/代理服务器的后面。

按照此处的说明：setting-up-standalone-kubernetes-cluster-behind-corporate-proxy

我使用以下命令设置代理服务器环境变量：

export http_proxy=http://proxy-host:proxy-port/
export HTTP_PROXY=$http_proxy
export https_proxy=$http_proxy
export HTTPS_PROXY=$http_proxy
printf -v lan '%s,' localip_of_machine
printf -v pool '%s,' 192.168.0.{1..253}
printf -v service '%s,' 10.96.0.{1..253}
export no_proxy="${lan%,},${service%,},${pool%,},127.0.0.1";
export NO_PROXY=$no_proxy

现在，集群中的所有内容都在内部运行。但是，当我尝试创建一个从外部拉下图像的吊舱时，吊舱卡在ContainerCreating上，例如

[gms@thalia0 ~]$ kubectl apply -f https://k8s.io/examples/admin/dns/busybox.yaml
pod/busybox created

被卡在这里：

[gms@thalia0 ~]$ kubectl get pods
NAME                            READY   STATUS              RESTARTS   AGE
busybox                         0/1     ContainerCreating   0          17m

我认为这是由于主机/域导致映像不在我们的公司代理规则之内。我们确实有规则

k8s.io
kubernetes.io
docker.io
docker.com

因此，我不确定还需要添加哪些其他主机/域。

我为busybox做了一个pod形容，并看到了对node.kubernetes.io的引用（我在*.kubernetes.io中添加了一个域范围的异常，希望这足够了）。

这是我从kubectl describe pods busybox得到的：

Volumes:
  default-token-2kfbw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-2kfbw
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age   From                          Message
  ----     ------                  ----  ----                          -------
  Normal   Scheduled               73s   default-scheduler             Successfully assigned default/busybox to thalia3.ahc.umn.edu
  Warning  FailedCreatePodSandBox  10s   kubelet, thalia3.ahc.umn.edu  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "6af48c5dadf6937f9747943603a3951bfaf25fe1e714cb0b0cbd4ff2d59aa918" network for pod "busybox": NetworkPlugin cni failed to set up pod "busybox_default" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout, failed to clean up sandbox container "6af48c5dadf6937f9747943603a3951bfaf25fe1e714cb0b0cbd4ff2d59aa918" network for pod "busybox": NetworkPlugin cni failed to teardown pod "busybox_default" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout]
  Normal   SandboxChanged          10s   kubelet, thalia3.ahc.umn.edu  Pod sandbox changed, it will be killed and re-created.

我认为印花布错误是由于以下原因造成的：

Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                     node.kubernetes.io/unreachable:NoExecute for 300s

calico和coredns窗格到达node.kubernetes.io时似乎有类似的错误，因此我认为这是由于我们的服务器无法在重启时拉下新映像

Answer 1

您似乎误解了一些Kubernetes概念，在此我想帮助您进行澄清。引用node.kubernetes.io并非试图对该域进行任何网络调用。 Kubernetes仅使用约定来指定字符串键。因此，如果您必须应用标签，注释或公差，则可以定义自己的键，例如subdomain.domain.tld/some-key。

对于您遇到的印花布问题，它看起来像是错误：

network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout]

是我们的罪魁祸首。 10.96.0.1是用于在Pod中引用Kubernetes API服务器的IP地址。您的节点上运行的calico/node吊舱似乎无法访问API服务器。您能否进一步了解如何设置Calico？您知道您正在运行什么版本的Calico吗？

您的calico/node实例正在尝试访问crd.projectcalico.org/v1/clusterinformations资源的事实告诉我它正在使用Kubernetes数据存储作为其后端。您确定不是要在Etcd模式下运行Calico吗？

Answer 2

拉图像似乎没有任何问题，因为您应该看到function axes1_ButtonDownFcn(hObject, eventdata, handles) % hObject handle to axes1 (see GCBO) % eventdata reserved - to be defined in a future version of MATLAB % handles structure with handles and user data (see GUIDATA) pt = get(hObject,'CurrentPoint'); disp(pt); plot(hObject,pt(1),pt(2),'o');状态。（尽管可能稍后会在您看到错误消息之后出现）

您从Pod中看到的错误与它们无法在内部连接到kube-apiserver有关。看起来像是超时，因此最有可能在您的默认名称空间中使用ImagePullBackOff服务。您可以像这样检查它，例如：

kubernetes

可能是缺少的（？）您可以随时重新创建它：

$ kubectl -n default get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   2d20h

这种容忍性基本上是说，pod可以容忍在具有$ cat <<'EOF' | kubectl apply -f - apiVersion: v1 kind: Service metadata: labels: component: apiserver provider: kubernetes name: kubernetes namespace: default spec: clusterIP: 10.96.0.1 type: ClusterIP ports: - name: https port: 443 protocol: TCP targetPort: 443 EOF和node.kubernetes.io/not-ready:NoExecute异味的节点上进行调度，但是您的错误看起来与此无关。

Answer 3

该问题通常意味着docker daemon无法响应。

如果还有其他服务消耗更多的CPU或I / O，则可能会出现此问题。

在公司防火墙/代理服务器后运行kubernetes kubeadm集群

3 个答案: