GCS存储桶中的Google Cloud Composer DAG相对目录

时间:2018-09-09 18:31:30

标签: output google-cloud-storage airflow google-cloud-composer

访问Composer实例GCS存储桶的根文件夹或任何其他airflow文件夹(如/ data)以将任务的输出文件保存为简单DAG的正确方法是:

import logging
from os import path
from datetime import datetime

import numpy as np
import pandas as pd
from airflow import models
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import PythonOperator


def write_to_file():
    df = pd.DataFrame(data=np.random.randint(low=0, high=10, size=(5, 5)),
                      columns=['a', 'b', 'c', 'd', 'e'])
    logging.info("Saving results")

    file_path = path.join("output.csv")

    df.to_csv(path_or_buf=file_path, index=False)


with models.DAG(dag_id='write_to_file',
                schedule_interval='*/10 * * * *',
                default_args={'depends_on_past': False,
                              'start_date': datetime(2018, 9, 8)}) as dag:
    t_start = DummyOperator(task_id='start')

    t_write = PythonOperator(
        task_id='write',
        python_callable=write_to_file
    )

    t_end = DummyOperator(task_id='end')

    t_start >> t_write >> t_end  

是否设置了一些环境变量,还是应该使用GCS挂钩?

1 个答案:

答案 0 :(得分:0)

我在作曲家邮件列表上得到了答案:“如果将操作员输出数据保存到/home/airflow/gcs/data,它将自动同步到gs://{composer-bucket}/data”。