访问Composer实例GCS存储桶的根文件夹或任何其他airflow文件夹(如/ data)以将任务的输出文件保存为简单DAG的正确方法是:
import logging
from os import path
from datetime import datetime
import numpy as np
import pandas as pd
from airflow import models
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import PythonOperator
def write_to_file():
df = pd.DataFrame(data=np.random.randint(low=0, high=10, size=(5, 5)),
columns=['a', 'b', 'c', 'd', 'e'])
logging.info("Saving results")
file_path = path.join("output.csv")
df.to_csv(path_or_buf=file_path, index=False)
with models.DAG(dag_id='write_to_file',
schedule_interval='*/10 * * * *',
default_args={'depends_on_past': False,
'start_date': datetime(2018, 9, 8)}) as dag:
t_start = DummyOperator(task_id='start')
t_write = PythonOperator(
task_id='write',
python_callable=write_to_file
)
t_end = DummyOperator(task_id='end')
t_start >> t_write >> t_end
是否设置了一些环境变量,还是应该使用GCS挂钩?
答案 0 :(得分:0)
我在作曲家邮件列表上得到了答案:“如果将操作员输出数据保存到/home/airflow/gcs/data
,它将自动同步到gs://{composer-bucket}/data
”。