我有以下气流障碍:
start_task = DummyOperator(task_id='start_task', dag=dag)
gcs_export_uri_template = 'adstest/2018/08/31/*'
update_bigquery = GoogleCloudStorageToBigQueryOperator(
dag=dag,
task_id='load_ads_to_BigQuery',
bucket=GCS_BUCKET_ID,
destination_project_dataset_table=table_name_template,
source_format='CSV',
source_objects=[gcs_export_uri_template],
schema_fields=dc(),
create_disposition='CREATE_IF_NEEDED',
write_disposition='WRITE_APPEND',
skip_leading_rows = 1,
google_cloud_storage_conn_id=CONNECTION_ID,
bigquery_conn_id=CONNECTION_ID
)
start_task >> update_bigquery
此数据从adstest/2018/08/31/*
加载到BigQuery,效果很好。
我想将Dag修改为基于execution date
运行日期:
Execution date
Execution date - 1 days
Execution date - 2 days
示例,如果执行日期为2018-09-02
,我希望DAG转到:
Execution date : adstest/2018/09/02/*
Execution date - 1 days : adstest/2018/09/01/*
Execution date - 2 days : adstest/2018/08/31/*
我该怎么做?
编辑: 这是我更新的代码:
for i in range(5, 0, -1):
gcs_export_uri_template = ['''adstest/{{ macros.ds_format(macros.ds_add(ds, -{0}), '%Y-%m-%d', '%Y/%m/%d') }}/*'''.format(i)]
update_bigquery = GoogleCloudStorageToBigQueryOperator(
dag=dag,
task_id='load_ads_to_BigQuery-{}'.format(i),
bucket=GCS_BUCKET_ID,
destination_project_dataset_table=table_name_template,
source_format='CSV',
source_objects=gcs_export_uri_template,
schema_fields=dc(),
create_disposition='CREATE_IF_NEEDED',
write_disposition='WRITE_APPEND',
skip_leading_rows=1,
google_cloud_storage_conn_id=CONNECTION_ID,
bigquery_conn_id=CONNECTION_ID
)
start_task >> update_bigquery
编辑2:
我的代码:
for i in range(5, 0, -1):
gcs_export_uri_template = ['''adstest/{{ macros.ds_format(macros.ds_add(ds, -params.i), '%Y-%m-%d', '%Y/%m/%d') }}/*'''.format(i)]
update_bigquery = GoogleCloudStorageToBigQueryOperator(
dag=dag,
task_id='load_ads_to_BigQuery-{}'.format(i),
bucket=GCS_BUCKET_ID,
destination_project_dataset_table=table_name_template,
source_format='CSV',
source_objects=gcs_export_uri_template,
schema_fields=dc(),
params={'i': i},
create_disposition='CREATE_IF_NEEDED',
write_disposition='WRITE_APPEND',
skip_leading_rows=1,
google_cloud_storage_conn_id=CONNECTION_ID,
bigquery_conn_id=CONNECTION_ID
)
代码给出此错误:
"Source URI must not contain the ',' character: gs://adstest/{ macros.ds_format(macros.ds_add(ds, -params.i), '%Y-%m-%d', '%Y/%m/%d') }/*">
答案 0 :(得分:2)
您可以使用Airflow Macros来实现此目标,如下所示:
gcs_export_uri_template=[
"adstest/{{ macros.ds_format(ds, '%Y-%m-%d', '%Y/%m/%d') }}/*",
"adstest/{{ macros.ds_format(prev_ds, '%Y-%m-%d', '%Y/%m/%d') }}/*",
"adstest/{{ macros.ds_format(macros.ds_add(ds, -2), '%Y-%m-%d', '%Y/%m/%d') }}/*"
]
update_bigquery = GoogleCloudStorageToBigQueryOperator(
dag=dag,
task_id='load_ads_to_BigQuery',
bucket=GCS_BUCKET_ID,
destination_project_dataset_table=table_name_template,
source_format='CSV',
source_objects=gcs_export_uri_template,
schema_fields=dc(),
create_disposition='CREATE_IF_NEEDED',
write_disposition='WRITE_APPEND',
skip_leading_rows = 1,
google_cloud_storage_conn_id=CONNECTION_ID,
bigquery_conn_id=CONNECTION_ID
)
运行以上代码时,可以在Web UI中检入呈现的参数:
对于已编辑的评论:
您将需要在i
参数中传递循环变量params
的值,并在字符串中将其用作params.i
,如下所示:
for i in range(5, 0, -1):
gcs_export_uri_template = ["adstest/{{ macros.ds_format(macros.ds_add(ds, -params.i), '%Y-%m-%d', '%Y/%m/%d') }}/*"]
update_bigquery = GoogleCloudStorageToBigQueryOperator(
dag=dag,
task_id='load_ads_to_BigQuery-{}'.format(i),
bucket=GCS_BUCKET_ID,
destination_project_dataset_table=table_name_template,
source_format='CSV',
source_objects=gcs_export_uri_template,
schema_fields=dc(),
params={'i': i},
create_disposition='CREATE_IF_NEEDED',
write_disposition='WRITE_APPEND',
skip_leading_rows=1,
google_cloud_storage_conn_id=CONNECTION_ID,
bigquery_conn_id=CONNECTION_ID
)
start_task >> update_bigquery