我在Amazon EC2实例上运行的使用Airflow版本1.9的工作环境正常。我需要升级到最新版本的Airflow,即1.10。我可以选择从1.9版升级,也可以在新服务器上全新安装1.10。 Pip中未列出Airflow版本1.10,因此我是通过此命令从Git安装的,
y
此命令已成功安装Airflow版本1.10。您可以通过运行命令pip-3.6 install git+git://github.com/apache/incubator-airflow.git@v1-10-stable
并查看输出来查看,
airflow version
当我尝试使用 ____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
v1.10.0
启动Airflow Scheduler时,出现以下异常,
airflow scheduler
这是我的lib文件夹中的内容,
ModuleNotFoundError: No module named 'MySQLdb'
[2018-08-14 14:03:16,195] {celery_executor.py:112} ERROR - Error syncing the celery executor, ignoring it:
[2018-08-14 14:03:16,195] {celery_executor.py:113} ERROR - No module named 'MySQLdb'
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 94, in sync
state = task.state
File "/usr/local/lib/python3.6/site-packages/celery/result.py", line 471, in state
return self._get_task_meta()['status']
File "/usr/local/lib/python3.6/site-packages/celery/result.py", line 410, in _get_task_meta
return self._maybe_set_cache(self.backend.get_task_meta(self.id))
File "/usr/local/lib/python3.6/site-packages/celery/backends/base.py", line 365, in get_task_meta
meta = self._get_task_meta_for(task_id)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 53, in _inner
return fun(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 122, in _get_task_meta_for
session = self.ResultSession()
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 99, in ResultSession
**self.engine_options)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 59, in session_factory
engine, session = self.create_session(dburi, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 45, in create_session
engine = self.get_engine(dburi, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 42, in get_engine
return create_engine(dburi, poolclass=NullPool)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/__init__.py", line 391, in create_engine
return strategy.create(*args, **kwargs)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 80, in create
dbapi = dialect_cls.dbapi(**dbapi_args)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 110, in dbapi
return __import__('MySQLdb')
ModuleNotFoundError: No module named 'MySQLdb'
[2018-08-14 14:03:16,196] {celery_executor.py:112} ERROR - Error syncing the celery executor, ignoring it:
[2018-08-14 14:03:16,196] {celery_executor.py:113} ERROR - No module named 'MySQLdb'
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 94, in sync
state = task.state
File "/usr/local/lib/python3.6/site-packages/celery/result.py", line 471, in state
return self._get_task_meta()['status']
File "/usr/local/lib/python3.6/site-packages/celery/result.py", line 410, in _get_task_meta
return self._maybe_set_cache(self.backend.get_task_meta(self.id))
File "/usr/local/lib/python3.6/site-packages/celery/backends/base.py", line 365, in get_task_meta
meta = self._get_task_meta_for(task_id)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 53, in _inner
return fun(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 122, in _get_task_meta_for
session = self.ResultSession()
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 99, in ResultSession
**self.engine_options)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 59, in session_factory
engine, session = self.create_session(dburi, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 45, in create_session
engine = self.get_engine(dburi, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 42, in get_engine
return create_engine(dburi, poolclass=NullPool)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/__init__.py", line 391, in create_engine
return strategy.create(*args, **kwargs)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 80, in create
dbapi = dialect_cls.dbapi(**dbapi_args)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 110, in dbapi
return __import__('MySQLdb')
ModuleNotFoundError: No module named 'MySQLdb'
[2018-08-14 14:03:16,197] {celery_executor.py:112} ERROR - Error syncing the celery executor, ignoring it:
[2018-08-14 14:03:16,197] {celery_executor.py:113} ERROR - No module named 'MySQLdb'
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 94, in sync
state = task.state
File "/usr/local/lib/python3.6/site-packages/celery/result.py", line 471, in state
return self._get_task_meta()['status']
File "/usr/local/lib/python3.6/site-packages/celery/result.py", line 410, in _get_task_meta
return self._maybe_set_cache(self.backend.get_task_meta(self.id))
File "/usr/local/lib/python3.6/site-packages/celery/backends/base.py", line 365, in get_task_meta
meta = self._get_task_meta_for(task_id)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 53, in _inner
return fun(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 122, in _get_task_meta_for
session = self.ResultSession()
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 99, in ResultSession
**self.engine_options)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 59, in session_factory
engine, session = self.create_session(dburi, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 45, in create_session
engine = self.get_engine(dburi, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 42, in get_engine
return create_engine(dburi, poolclass=NullPool)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/__init__.py", line 391, in create_engine
return strategy.create(*args^C[2018-08-14 14:03:16,424] {jobs.py:1585} INFO - Exited execute loop
[2018-08-14 14:03:16,433] {jobs.py:1599} INFO - Terminating child PID: 13615
我只是感到困惑,为什么Airflow的安装未解决所有需要的依赖项。我是否正确安装了Airflow?我确实需要使用1.10版,因为1.9版存在一个重大错误,如发现here和here一样。
答案 0 :(得分:9)
执行全新安装时,可以提供许多安装额外功能(“可选依赖项”)。默认情况下,Airflow不会全部安装它们,因为有数十种,并且其中一些需要特殊的依赖项,例如Mesos或Kubernetes。
https://airflow.readthedocs.io/en/stable/installation.html#extra-packages
请注意,对于1.10.0-1.10.2,您现在需要在安装命令前加上命令或导出此环境变量:
export SLUGIFY_USES_TEXT_UNIDECODE=yes
for 1.10.3及更高版本不再需要。
1.10发布后,您将可以安装以下附加功能:
pip install apache-airflow[celery,devel,postgres]
从git安装时,用于安装Extras的pip语法要复杂一些:
pip install git+git://github.com/apache/incubator-airflow.git@v1-10-stable#egg=apache-airflow[celery,devel,postgres]
如果您尝试安装具有MySQL支持的Airflow,则可以额外添加mysql
:
pip install git+git://github.com/apache/incubator-airflow.git@v1-10-stable#egg=apache-airflow[mysql]
如果您确实要安装所有附加功能,则可以使用all
附加功能:
pip install git+git://github.com/apache/incubator-airflow.git@v1-10-stable#egg=apache-airflow[all]
注意:如果您以前在PyPI上安装了apache-airflow
1.9的任何其他功能,则从GitHub安装1.10时,您需要在此处再次提供它们,因为pip不会将GitHub存储库与PyPI软件包关联。 / p>
问题
mysql
,还会收到相同的错误消息吗?