import logging
from datetime import datetime, timedelta
from airflow.utils import dates
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.contrib.operators.bigquery_get_data import BigQueryGetDataOperator
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': dates.days_ago(2),
}
dag = DAG(
dag_id='bigQueryPipeline',
default_args=default_args,
schedule_interval='0 0 * * *'
)
t1 = BigQueryGetDataOperator(
task_id='bigquery_test',
dataset_id= <my-dataset-name>,
table_id= <my-table-id>,
max_results='2',
)
def print_context(**context):
import time
import json
xcom_pull = context['ti'].xcom_pull(task_ids='bigquery_test')
logging.info('logging ', json.dumps(xcom_pull))
t2 = PythonOperator(
task_id='print_result',
python_callable=print_context,
provide_context=True,
dag=dag
)
t1 >> t2
if __name__ == "__main__":
dag.cli()
所以,这是我的DAG。我正在测试从BigQuery表获取数据。除了max_results参数(位于docs中)之外,其他所有东西都起作用。
我在日志中看到:
[2019-11-26 14:46:02,272] {bigquery_get_data.py:92} INFO - Fetching Data from:
[2019-11-26 14:46:02,272] {bigquery_get_data.py:94} INFO - Dataset: <my-dataset> ; Table: <my-table> ; Max Results: 2
[2019-11-26 14:46:02,291] {logging_mixin.py:112} INFO - [2019-11-26 14:46:02,291] {gcp_api_base_hook.py:145} INFO - Getting connection using `google.auth.default()` since no key file is defined for hook.
[2019-11-26 14:46:02,309] {logging_mixin.py:112} INFO - [2019-11-26 14:46:02,309] {discovery.py:271} INFO - URL being requested: GET https://www.googleapis.com/discovery/v1/apis/bigquery/v2/rest
[2019-11-26 14:46:02,412] {logging_mixin.py:112} INFO - [2019-11-26 14:46:02,412] {discovery.py:867} INFO - URL being requested: GET https://bigquery.googleapis.com/bigquery/v2/projects/<my-project>/datasets/<my-dataset>tables/<my-table>/data?maxResults=2&alt=json
[2019-11-26 14:46:02,851] {bigquery_get_data.py:106} INFO - Total Extracted rows: 77374
请注意第二行的Max Results: 2
和第五行的?maxResults=2
查询字符串。除此之外,最后一行Total Extracted rows: 77374
。
我猜这可能是bigquery api错误?
你们中有人知道如何向Airflow报告吗?还有Google?
编辑:找到了submit bug reports for airflow的位置。