在大多数情况下,第一次设置Airflow /与K8配合使用,因此只是尝试使其在本地运行并在小型DAG中运行几个简单的任务。使用其他执行程序,我可以正常运行,但是考虑到我想在生产中使用K8s功能,我试图在本地进行设置。
设置非常简单-与其他执行程序一起运行的通用测试DAG以及与Airflow相对未改动的配置文件(要注意的主要事情是:使用KubernetesExecutor,postgresql + psyocopg2 SQLAlchemy后端以及in_cluster设置到False
,因为我们不是在K8中运行Airflow本身-其他一切都是标准的)。
Airflow与调度程序一起很好地启动了本地Web服务器,并在我启动DAG运行时启动了调度任务,但是任务被置于queued
状态,并且永远不会离开它。我想这与我在任务中看到的广告连播状态有关:
NAME READY STATUS RESTARTS AGE
testinglocalprintingdate-00b9b3a324b04913bf98d935ae076885 0/1 InvalidImageName 0 79s
testinglocalprintingdate-2d4a912ac30c4987af69d9ce62e36989 0/1 InvalidImageName 0 81s
testinglocalprintingdate-5a655060809647c69f4258fc32d9513d 0/1 InvalidImageName 0 77s
testinglocalprintingdate-9c3ccfebb34b4d0a84d6e8f43e144e69 0/1 InvalidImageName 0 75s
testinglocalprintingdate-d1b8d59260954638b0bc018b7743985b 0/1 InvalidImageName 0 73s
此外,我每分钟左右都会看到这些错误(在Airflow配置中链接到此kube_client_request_args = {"_request_timeout" : [60,60] }
-将数字从60,60更改为其他任何值都无效):
[2020-02-07 17:22:32,244] {kubernetes_executor.py:337} ERROR - Unknown error in KubernetesJobWatcher. Failing
Traceback (most recent call last):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 425, in _error_catcher
yield
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 752, in read_chunked
self._update_chunk_length()
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 682, in _update_chunk_length
line = self._fp.fp.readline()
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/ssl.py", line 1071, in recv_into
return self.read(nbytes, buffer)
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/ssl.py", line 929, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 335, in run
self.worker_uuid, self.kube_config)
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 359, in _run
**kwargs):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 144, in stream
for line in iter_resp_lines(resp):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 48, in iter_resp_lines
for seg in resp.read_chunked(decode_content=False):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 781, in read_chunked
self._original_response.close()
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 430, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='192.168.64.2', port=8443): Read timed out.
Process KubernetesJobWatcher-3:
Traceback (most recent call last):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 425, in _error_catcher
yield
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 752, in read_chunked
self._update_chunk_length()
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 682, in _update_chunk_length
line = self._fp.fp.readline()
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/ssl.py", line 1071, in recv_into
return self.read(nbytes, buffer)
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/ssl.py", line 929, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 335, in run
self.worker_uuid, self.kube_config)
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 359, in _run
**kwargs):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 144, in stream
for line in iter_resp_lines(resp):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 48, in iter_resp_lines
for seg in resp.read_chunked(decode_content=False):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 781, in read_chunked
self._original_response.close()
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 430, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='192.168.64.2', port=8443): Read timed out.
[2020-02-07 17:22:32,597] {kubernetes_executor.py:442} ERROR - Error while health checking kube watcher process. Process died for unknown reasons
[2020-02-07 17:22:32,615] {kubernetes_executor.py:346} INFO - Event: and now my watch begins starting at resource_version: 0
我已经尝试调试了几天,但无济于事-我们将不胜感激。