Kubernetes的气流-气流产生Kube吊舱,但吊舱没有做任何事情

时间:2020-02-07 22:29:46

标签: python kubernetes airflow airflow-scheduler

在大多数情况下,第一次设置Airflow /与K8配合使用,因此只是尝试使其在本地运行并在小型DAG中运行几个简单的任务。使用其他执行程序,我可以正常运行,但是考虑到我想在生产中使用K8s功能,我试图在本地进行设置。

设置非常简单-与其他执行程序一起运行的通用测试DAG以及与Airflow相对未改动的配置文件(要注意的主要事情是:使用KubernetesExecutor,postgresql + psyocopg2 SQLAlchemy后端以及in_cluster设置到False,因为我们不是在K8中运行Airflow本身-其他一切都是标准的)。

Airflow与调度程序一起很好地启动了本地Web服务器,并在我启动DAG运行时启动了调度任务,但是任务被置于queued状态,并且永远不会离开它。我想这与我在任务中看到的广告连播状态有关:

NAME                                                                 READY   STATUS             RESTARTS   AGE
testinglocalprintingdate-00b9b3a324b04913bf98d935ae076885   0/1     InvalidImageName   0          79s
testinglocalprintingdate-2d4a912ac30c4987af69d9ce62e36989   0/1     InvalidImageName   0          81s
testinglocalprintingdate-5a655060809647c69f4258fc32d9513d   0/1     InvalidImageName   0          77s
testinglocalprintingdate-9c3ccfebb34b4d0a84d6e8f43e144e69   0/1     InvalidImageName   0          75s
testinglocalprintingdate-d1b8d59260954638b0bc018b7743985b   0/1     InvalidImageName   0          73s

此外,我每分钟左右都会看到这些错误(在Airflow配置中链接到此kube_client_request_args = {"_request_timeout" : [60,60] }-将数字从60,60更改为其他任何值都无效):

[2020-02-07 17:22:32,244] {kubernetes_executor.py:337} ERROR - Unknown error in KubernetesJobWatcher. Failing
Traceback (most recent call last):
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 425, in _error_catcher
    yield
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 752, in read_chunked
    self._update_chunk_length()
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 682, in _update_chunk_length
    line = self._fp.fp.readline()
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/ssl.py", line 1071, in recv_into
    return self.read(nbytes, buffer)
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/ssl.py", line 929, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 335, in run
    self.worker_uuid, self.kube_config)
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 359, in _run
    **kwargs):
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 144, in stream
    for line in iter_resp_lines(resp):
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 48, in iter_resp_lines
    for seg in resp.read_chunked(decode_content=False):
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 781, in read_chunked
    self._original_response.close()
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 430, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='192.168.64.2', port=8443): Read timed out.
Process KubernetesJobWatcher-3:
Traceback (most recent call last):
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 425, in _error_catcher
    yield
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 752, in read_chunked
    self._update_chunk_length()
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 682, in _update_chunk_length
    line = self._fp.fp.readline()
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/ssl.py", line 1071, in recv_into
    return self.read(nbytes, buffer)
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/ssl.py", line 929, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 335, in run
    self.worker_uuid, self.kube_config)
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 359, in _run
    **kwargs):
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 144, in stream
    for line in iter_resp_lines(resp):
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 48, in iter_resp_lines
    for seg in resp.read_chunked(decode_content=False):
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 781, in read_chunked
    self._original_response.close()
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 430, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='192.168.64.2', port=8443): Read timed out.
[2020-02-07 17:22:32,597] {kubernetes_executor.py:442} ERROR - Error while health checking kube watcher process. Process died for unknown reasons
[2020-02-07 17:22:32,615] {kubernetes_executor.py:346} INFO - Event: and now my watch begins starting at resource_version: 0

我已经尝试调试了几天,但无济于事-我们将不胜感激。

0 个答案:

没有答案