我是Airflow的新手,我正在尝试使用airflow建立数据管道,但它总是会出现一些异常。我的airflow.cfg看起来像这样:
executor = LocalExecutor
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@localhost/airflow
sql_alchemy_pool_size = 5
parallelism = 96
dag_concurrency = 96
worker_concurrency = 96
max_threads = 96
broker_url = postgresql+psycopg2://airflow:airflow@localhost/airflow
result_backend = postgresql+psycopg2://airflow:airflow@localhost/airflow
当我在一个终端中启动airflow webserver -p 8080
,然后在另一终端中启动airflow scheduler
时,调度程序运行将具有以下执行(当我将并行度设置为更大的数量时失败,它可以正常工作否则,这可能是特定于计算机的,但至少我们知道它是由并行性导致的。我已经尝试在计算机上运行1000个python进程,并且工作正常,我将Postgres配置为允许最多500个数据库连接,但仍然给我错误。
[2019-11-20 12:15:00,820] {dag_processing.py:556} INFO - Launched DagFileProcessorManager with pid: 85050
Process QueuedLocalWorker-18:
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/managers.py", line 811, in _callmethod
conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/Users/edward/.local/share/virtualenvs/avat-utils-JpGzQGRW/lib/python3.7/site-packages/airflow/executors/local_executor.py", line 111, in run
key, command = self.task_queue.get()
File "<string>", line 2, in get
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/managers.py", line 815, in _callmethod
self._connect()
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/managers.py", line 802, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 492, in Client
c = SocketClient(address)
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 619, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 61] Connection refused
谢谢
已更新:我尝试在Pycharm中运行,但在Pycharm中运行良好,但有时在终端机中失败了,有时却没有
答案 0 :(得分:0)
几天前发现,Airflow实际上是在启动时启动所有并行进程的,我当时以max_sth和并行性为容量,但这是启动时它将运行的进程数。因此,看来此问题是由计算机资源不足引起的。
答案 1 :(得分:0)
我有同样的问题。原来我在airflow.cfg中与LocalExecutor一起设置了max_threads = 10。切换max_threads = 2解决了该问题。