我在K8上有一个多部署应用程序,突然DNS之一的组件(部署程序)随机失败。从部署者容器内部,如果我随机使用其他组件(网桥)的服务名称或服务IP运行curl
命令,则会得到:
curl -v http://bridge:9998
* Could not resolve host: bridge
* Expire in 200 ms for 1 (transfer 0x555f0636fdd0)
* Closing connection 0
curl: (6) Could not resolve host: bridge
但是,如果我使用桥接器的IP,它可以解析并连接:
curl -v http://10.36.0.25:9998
* Expire in 0 ms for 6 (transfer 0x558d6c3eadd0)
* Trying 10.36.0.25...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x558d6c3eadd0)
* Connected to 10.36.0.25 (10.36.0.25) port 9998 (#0)
> GET / HTTP/1.1
> Host: 10.36.0.25:9998
> User-Agent: curl/7.64.0
> Accept: */*
>
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Accept-Ranges: bytes
< Cache-Control: public, max-age=0
< Last-Modified: Mon, 08 Apr 2019 14:06:42 GMT
< ETag: W/"179-169fd45c550"
< Content-Type: text/html; charset=UTF-8
< Content-Length: 377
< Date: Wed, 11 Sep 2019 08:25:24 GMT
< Connection: keep-alive
还有我的部署者yaml文件:
---
apiVersion: v1
kind: Service
metadata:
annotations:
Process: deployer
creationTimestamp: null
labels:
io.kompose.service: deployer
name: deployer
spec:
ports:
- name: "8004"
port: 8004
targetPort: 8004
selector:
io.kompose.service: deployer
status:
loadBalancer: {}
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
Process: deployer
creationTimestamp: null
labels:
io.kompose.service: deployer
name: deployer
spec:
replicas: 1
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: deployer
spec:
containers:
- args:
- bash
- -c
- lttng create && python src/rest.py
env:
- name: CONFIG_OVERRIDE
value: {{ .Values.CONFIG_OVERRIDE | quote}}
- name: WWS_RTMP_SERVER_URL
value: {{ .Values.WWS_RTMP_SERVER_URL | quote}}
- name: WWS_DEPLOYER_DEFAULT_SITE
value: {{ .Values.WWS_DEPLOYER_DEFAULT_SITE | quote}}
image: {{ .Values.image }}
name: deployer
readinessProbe:
exec:
command:
- ls
- /tmp
initialDelaySeconds: 5
periodSeconds: 5
ports:
- containerPort: 8004
resources:
requests:
cpu: 0.1
memory: 250Mi
limits:
cpu: 2
memory: 5Gi
restartPolicy: Always
imagePullSecrets:
- name: deployersecret
status: {}
正如我提到的,这种情况仅发生在此组件上,我从其他Pod内部运行了完全相同的命令,并且该命令正常运行。知道如何解决这个问题吗?
由于人们弄错了,所以我将进一步描述这种情况:上面的yaml文件属于面临此问题的组件(其他组件正常工作),而curl命令是我从这个有问题的Pod内部运行的命令。如果我从另一个容器中运行完全相同的命令,它将解决。 以下是目标的部署和服务,以供您参考:
apiVersion: v1
kind: Service
metadata:
annotations:
Process: bridge
creationTimestamp: null
labels:
io.kompose.service: bridge
name: bridge
spec:
ports:
- name: "9998"
port: 9998
targetPort: 9998
- name: "9226"
port: 9226
targetPort: 9226
- name: 9226-udp
port: 9226
protocol: UDP
targetPort: 9226
selector:
io.kompose.service: bridge
status:
loadBalancer: {}
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
Process: bridge
creationTimestamp: null
labels:
io.kompose.service: bridge
name: bridge
spec:
replicas: 1
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: bridge
spec:
containers:
- args:
- bash
- -c
- npm run startDebug
env:
- name: NODE_ENV
value: {{ .Values.NODE_ENV | quote }}
image: {{ .Values.image }}
name: bridge
readinessProbe:
httpGet:
port: 9998
initialDelaySeconds: 3
periodSeconds: 15
ports:
- containerPort: 9998
- containerPort: 9226
- containerPort: 9226
protocol: UDP
resources:
requests:
cpu: 0.1
memory: 250Mi
limits:
cpu: 2
memory: 5Gi
restartPolicy: Always
imagePullSecrets:
- name: bridgesecret
status: {}
答案 0 :(得分:2)
问题是我使用的图像。出现问题的组件以及另一个组件正在使用基于python2.7的映像,但配置不同,并且都存在DNS问题,但所有其他组件均正常工作。我基于Ubuntu构建了映像,现在一切都很好。
我认为这可能与CoreDNS使用的GO实施有关,由于某种原因,python图像无法与该实施正确配合,这是我的一位同事告诉我的,他也遇到了同样的问题在他与GO一起进行另一个项目之前。
答案 1 :(得分:0)
您的服务已打开端口8004
在端口上发送curl时:9998
curl -v http://bridge:9998
由于这次错过比赛,我认为它不起作用
虽然将服务公开为LoadBalancer
,所以在外部群集中,您必须使用Loadbalancer的IP地址来访问服务。
如果您想在群集内部自行解析,则可以使用服务名称。像
http://bridge:9998
只有在Internet外部,您才能使用负载平衡器访问它。
答案 2 :(得分:0)
定义“ targetPort:8004”,您正在此端口上发布服务。为什么您要在另一个端口9998上卷曲服务?