当并行度设置为大量时,气流调度程序会异常启动

时间:2019-11-20 03:37:56

标签: python-3.x multiprocessing airflow airflow-scheduler

我是Airflow的新手,我正在尝试使用airflow建立数据管道,但它总是会出现一些异常。我的airflow.cfg看起来像这样:

executor = LocalExecutor
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@localhost/airflow
sql_alchemy_pool_size = 5
parallelism = 96
dag_concurrency = 96
worker_concurrency = 96
max_threads = 96
broker_url = postgresql+psycopg2://airflow:airflow@localhost/airflow
result_backend = postgresql+psycopg2://airflow:airflow@localhost/airflow

当我在一个终端中启动airflow webserver -p 8080,然后在另一终端中启动airflow scheduler时,调度程序运行将具有以下执行(当我将并行度设置为更大的数量时失败,它可以正常工作否则,这可能是特定于计算机的,但至少我们知道它是由并行性导致的。我已经尝试在计算机上运行1000个python进程,并且工作正常,我将Postgres配置为允许最多500个数据库连接,但仍然给我错误。

[2019-11-20 12:15:00,820] {dag_processing.py:556} INFO - Launched DagFileProcessorManager with pid: 85050
Process QueuedLocalWorker-18:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/managers.py", line 811, in _callmethod
    conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/Users/edward/.local/share/virtualenvs/avat-utils-JpGzQGRW/lib/python3.7/site-packages/airflow/executors/local_executor.py", line 111, in run
    key, command = self.task_queue.get()
  File "<string>", line 2, in get
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/managers.py", line 815, in _callmethod
    self._connect()
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/managers.py", line 802, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 492, in Client
    c = SocketClient(address)
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 619, in SocketClient
    s.connect(address)
ConnectionRefusedError: [Errno 61] Connection refused

谢谢

已更新:我尝试在Pycharm中运行,但在Pycharm中运行良好,但有时在终端机中失败了,有时却没有

2 个答案:

答案 0 :(得分:0)

几天前发现,Airflow实际上是在启动时启动所有并行进程的,我当时以max_sth和并行性为容量,但这是启动时它将运行的进程数。因此,看来此问题是由计算机资源不足引起的。

答案 1 :(得分:0)

我有同样的问题。原来我在airflow.cfg中与LocalExecutor一起设置了max_threads = 10。切换max_threads = 2解决了该问题。

相关问题