DNS不适用于K8上的一种部署

时间:2019-09-11 08:32:06

标签: kubernetes dns

我在K8上有一个多部署应用程序,突然DNS之一的组件(部署程序)随机失败。从部署者容器内部,如果我随机使用其他组件(网桥)的服务名称或服务IP运行curl命令,则会得到:

curl -v http://bridge:9998
* Could not resolve host: bridge
* Expire in 200 ms for 1 (transfer 0x555f0636fdd0)
* Closing connection 0
curl: (6) Could not resolve host: bridge

但是,如果我使用桥接器的IP,它可以解析并连接:

curl -v http://10.36.0.25:9998
* Expire in 0 ms for 6 (transfer 0x558d6c3eadd0)
*   Trying 10.36.0.25...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x558d6c3eadd0)
* Connected to 10.36.0.25 (10.36.0.25) port 9998 (#0)
> GET / HTTP/1.1
> Host: 10.36.0.25:9998
> User-Agent: curl/7.64.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Accept-Ranges: bytes
< Cache-Control: public, max-age=0
< Last-Modified: Mon, 08 Apr 2019 14:06:42 GMT
< ETag: W/"179-169fd45c550"
< Content-Type: text/html; charset=UTF-8
< Content-Length: 377
< Date: Wed, 11 Sep 2019 08:25:24 GMT
< Connection: keep-alive

还有我的部署者yaml文件:

---
apiVersion: v1
kind: Service
metadata:
  annotations:
    Process: deployer
  creationTimestamp: null
  labels:
    io.kompose.service: deployer
  name: deployer
spec:
  ports:
  - name: "8004"
    port: 8004
    targetPort: 8004
  selector:
    io.kompose.service: deployer
status:
  loadBalancer: {}
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    Process: deployer
  creationTimestamp: null
  labels:
    io.kompose.service: deployer
  name: deployer
spec:
  replicas: 1
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        io.kompose.service: deployer
    spec:
      containers:
      - args:
        - bash
        - -c
        - lttng create && python src/rest.py
        env:
        - name: CONFIG_OVERRIDE
          value: {{ .Values.CONFIG_OVERRIDE | quote}}
        - name: WWS_RTMP_SERVER_URL
          value: {{ .Values.WWS_RTMP_SERVER_URL | quote}}
        - name: WWS_DEPLOYER_DEFAULT_SITE
          value: {{ .Values.WWS_DEPLOYER_DEFAULT_SITE | quote}}
        image: {{ .Values.image }}
        name: deployer
        readinessProbe:
          exec:
            command:
            - ls
            - /tmp
          initialDelaySeconds: 5
          periodSeconds: 5
        ports:
        - containerPort: 8004
        resources:
          requests:
            cpu: 0.1
            memory: 250Mi
          limits:
            cpu: 2
            memory: 5Gi
      restartPolicy: Always
      imagePullSecrets:
      - name: deployersecret
status: {}

正如我提到的,这种情况仅发生在此组件上,我从其他Pod内部运行了完全相同的命令,并且该命令正常运行。知道如何解决这个问题吗?

更新

由于人们弄错了,所以我将进一步描述这种情况:上面的yaml文件属于面临此问题的组件(其他组件正常工作),而curl命令是我从这个有问题的Pod内部运行的命令。如果我从另一个容器中运行完全相同的命令,它将解决。 以下是目标的部署和服务,以供您参考:

apiVersion: v1
kind: Service
metadata:
  annotations:
    Process: bridge
  creationTimestamp: null
  labels:
    io.kompose.service: bridge
  name: bridge
spec:
  ports:
  - name: "9998"
    port: 9998
    targetPort: 9998
  - name: "9226"
    port: 9226
    targetPort: 9226
  - name: 9226-udp
    port: 9226
    protocol: UDP
    targetPort: 9226
  selector:
    io.kompose.service: bridge
status:
  loadBalancer: {}
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    Process: bridge
  creationTimestamp: null
  labels:
    io.kompose.service: bridge
  name: bridge
spec:
  replicas: 1
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        io.kompose.service: bridge
    spec:
      containers:
      - args:
        - bash
        - -c
        - npm run startDebug
        env:
        - name: NODE_ENV
          value: {{ .Values.NODE_ENV | quote }}
        image: {{ .Values.image }}
        name: bridge
        readinessProbe:
          httpGet:
            port: 9998
          initialDelaySeconds: 3
          periodSeconds: 15
        ports:
        - containerPort: 9998
        - containerPort: 9226
        - containerPort: 9226
          protocol: UDP
        resources:
          requests:
            cpu: 0.1
            memory: 250Mi
          limits:
            cpu: 2
            memory: 5Gi
      restartPolicy: Always
      imagePullSecrets:
      - name: bridgesecret
status: {}

3 个答案:

答案 0 :(得分:2)

问题是我使用的图像。出现问题的组件以及另一个组件正在使用基于python2.7的映像,但配置不同,并且都存在DNS问题,但所有其他组件均正常工作。我基于Ubuntu构建了映像,现在一切都很好。

我认为这可能与CoreDNS使用的GO实施有关,由于某种原因,python图像无法与该实施正确配合,这是我的一位同事告诉我的,他也遇到了同样的问题在他与GO一起进行另一个项目之前。

答案 1 :(得分:0)

您的服务已打开端口8004

在端口上发送curl时:9998

curl -v http://bridge:9998

由于这次错过比赛,我认为它不起作用

虽然将服务公开为LoadBalancer,所以在外部群集中,您必须使用Loadbalancer的IP地址来访问服务。

如果您想在群集内部自行解析,则可以使用服务名称。像

http://bridge:9998

只有在Internet外部,您才能使用负载平衡器访问它。

答案 2 :(得分:0)

定义“ targetPort:8004”,您正在此端​​口上发布服务。为什么您要在另一个端口9998上卷曲服务?