Airflow / Python等很新,但似乎无法解决我需要做的事情来解决这个问题..
气流在Puckel / Docker上运行
完整错误是:
破坏的DAG:[/ usr / local / airflow / dags / xxxxx.py]没有名为&air; Airflow.contrib.operators.gsc_to_gcs'
的模块在python代码中,我写过:
来自airflow.contrib.operators.gcs_to_gcs的导入GoogleCloudStorageToGoogleCloudStorageOperator
我猜我需要安装gcs_to_gcs模块,但我不知道该怎么做。
非常感谢任何具体说明: - )
答案 0 :(得分:2)
v1.9.0中没有GoogleCloudStorageToGoogleCloudStorageOperator
,因此您必须从here复制文件以及here中的相关摘要并将其粘贴到Airflow文件夹中在您的python环境中的相应位置。请按照以下步骤操作:
运行以下代码以查找Apache Airflow在您的计算机上的存储位置:
pip show apache-airflow
应在终端上产生以下输出:
Name: apache-airflow
Version: 2.0.0.dev0+incubating
Summary: Programmatically author, schedule and monitor data pipelines
Home-page: http://airflow.incubator.apache.org/
Author: Apache Software Foundation
Author-email: dev@airflow.incubator.apache.org
License: Apache License 2.0
Location: /Users/kaxil/anaconda2/lib/python2.7/site-packages
Requires: iso8601, bleach, gunicorn, sqlalchemy-utc, markdown, flask-caching, alembic, croniter, flask-wtf, requests, tabulate, psutil, jinja2, gitpython, python-nvd3, sqlalchemy, dill, flask, pandas, pendulum, flask-login, funcsigs, flask-swagger, flask-admin, lxml, python-dateutil, pygments, werkzeug, tzlocal, python-daemon, setproctitle, zope.deprecation, flask-appbuilder, future, configparser, thrift
Required-by:
位置之后的路径是 Apache Airflow 目录
现在克隆git repo以获取这两个文件:
# Clone the git repo to `airflow-temp` folder
git clone https://github.com/apache/incubator-airflow airflow-temp
# Copy the hook from the cloned repo to where Apache Airflow is located
# Replace LINK_TO_SITE_PACKAGES_DIR with the path you found above
cp airflow-temp/airflow/contrib/hooks/gcs_hook.py LINK_TO_SITE_PACKAGES_DIR/airflow/contrib/hooks/
# For example: for me, it would be
cp airflow-temp/airflow/contrib/hooks/gcs_hook.py /Users/kaxil/anaconda2/lib/python2.7/site-packages/airflow/contrib/hooks/
# Do the same with operator file
cp airflow-temp/airflow/contrib/operators/gcs_to_gcs.py LINK_TO_SITE_PACKAGES_DIR/airflow/contrib/operators/
# For example: for me, it would be
cp airflow-temp/airflow/contrib/operators/gcs_to_gcs.py /Users/kaxil/anaconda2/lib/python2.7/site-packages/airflow/contrib/operators/
重新运行气流webserver
和scheduler
,现在应该可以了。
答案 1 :(得分:0)
我知道这是一个古老的问题,但是由于Cloud-Composer仍不支持GoogleCloudStorageToGoogleCloudStorageOperator
,我只是尝试使用相同的运算符并收到了相同的消息。
我设法使用一个简单的BashOperator来解决我需要的问题
from airflow.operators.bash_operator import BashOperator
with models.DAG(
dag_name,
schedule_interval=timedelta(days=1),
default_args=default_dag_args) as dag:
copy_files = BashOperator(
task_id='copy_files',
bash_command='gsutil -m cp <Source Bucket> <Destination Bucket>'
)
非常简单,可以根据需要创建文件夹并重命名文件。