破碎的DAG:没有名为&airffan.contrib.gsc_to_gcs'的模块。

时间:2018-05-23 01:58:55

标签: python docker airflow

Airflow / Python等很新,但似乎无法解决我需要做的事情来解决这个问题..

气流在Puckel / Docker上运行

完整错误是:

破坏的DAG:[/ usr / local / airflow / dags / xxxxx.py]没有名为&air; Airflow.contrib.operators.gsc_to_gcs'

的模块

在python代码中,我写过:

来自airflow.contrib.operators.gcs_to_gcs的

导入GoogleCloudStorageToGoogleCloudStorageOperator

我猜我需要安装gcs_to_gcs模块,但我不知道该怎么做。

非常感谢任何具体说明: - )

2 个答案:

答案 0 :(得分:2)

v1.9.0中没有GoogleCloudStorageToGoogleCloudStorageOperator,因此您必须从here复制文件以及here中的相关摘要并将其粘贴到Airflow文件夹中在您的python环境中的相应位置。请按照以下步骤操作:

运行以下代码以查找Apache Airflow在您的计算机上的存储位置:

pip show apache-airflow

应在终端上产生以下输出:

Name: apache-airflow
Version: 2.0.0.dev0+incubating
Summary: Programmatically author, schedule and monitor data pipelines
Home-page: http://airflow.incubator.apache.org/
Author: Apache Software Foundation
Author-email: dev@airflow.incubator.apache.org
License: Apache License 2.0
Location: /Users/kaxil/anaconda2/lib/python2.7/site-packages
Requires: iso8601, bleach, gunicorn, sqlalchemy-utc, markdown, flask-caching, alembic, croniter, flask-wtf, requests, tabulate, psutil, jinja2, gitpython, python-nvd3, sqlalchemy, dill, flask, pandas, pendulum, flask-login, funcsigs, flask-swagger, flask-admin, lxml, python-dateutil, pygments, werkzeug, tzlocal, python-daemon, setproctitle, zope.deprecation, flask-appbuilder, future, configparser, thrift
Required-by:

位置之后的路径是 Apache Airflow 目录

现在克隆git repo以获取这两个文件:

# Clone the git repo to `airflow-temp` folder
git clone https://github.com/apache/incubator-airflow airflow-temp

# Copy the hook from the cloned repo to where Apache Airflow is located
# Replace LINK_TO_SITE_PACKAGES_DIR with the path you found above
cp airflow-temp/airflow/contrib/hooks/gcs_hook.py LINK_TO_SITE_PACKAGES_DIR/airflow/contrib/hooks/

# For example: for me, it would be 
cp airflow-temp/airflow/contrib/hooks/gcs_hook.py /Users/kaxil/anaconda2/lib/python2.7/site-packages/airflow/contrib/hooks/

# Do the same with operator file
cp airflow-temp/airflow/contrib/operators/gcs_to_gcs.py LINK_TO_SITE_PACKAGES_DIR/airflow/contrib/operators/

# For example: for me, it would be 
cp airflow-temp/airflow/contrib/operators/gcs_to_gcs.py /Users/kaxil/anaconda2/lib/python2.7/site-packages/airflow/contrib/operators/

重新运行气流webserverscheduler,现在应该可以了。

答案 1 :(得分:0)

我知道这是一个古老的问题,但是由于Cloud-Composer仍不支持GoogleCloudStorageToGoogleCloudStorageOperator,我只是尝试使用相同的运算符并收到了相同的消息。

我设法使用一个简单的BashOperator来解决我需要的问题

    from airflow.operators.bash_operator import BashOperator

with models.DAG(
            dag_name,
            schedule_interval=timedelta(days=1),
            default_args=default_dag_args) as dag:

        copy_files = BashOperator(
            task_id='copy_files',
            bash_command='gsutil -m cp <Source Bucket> <Destination Bucket>'
        )

非常简单,可以根据需要创建文件夹并重命名文件。