从airflow提交数据流作业时,它无法获取数据流作业的成功状态,并继续显示以下错误。
{gcp_dataflow_hook.py:77} INFO - Google Cloud DataFlow job not available yet..
气流损失
t2 = DataFlowPythonOperator(
task_id='google_dataflow',
py_file='/Users/abc/sample.py',
gcp_conn_id='connection_id',
dataflow_default_options={
"project": 'Project_id'
"runner": "DataflowRunner",
"staging_location": 'gs://Project_id/staging',
"temp_location": 'gs://Project_id/staging'
}
)
Sample.py
def run():
argv = [
'--project={0}'.format(PROJECT),
'--staging_location=gs://{0}/staging/'.format(BUCKET),
'--temp_location=gs://{0}/staging/'.format(BUCKET),
'--runner=DataflowRunner'
]
with beam.Pipeline(argv=argv) as p:
(p | 'read_bq_table' >> beam.io.Read(beam.io.BigQuerySource(
query = 'Select * from `ds.table` limit 10',
use_standard_sql=True))
if __name__ == '__main__':
logging.getLogger().setLevel(logging.INFO)
run()
已经在论坛上阅读了其他答案,并且根据建议,我从sample.py和Airflow dag中删除了工作名称,但是airflow仍无法获取成功返回代码。
当工作提交到数据流时,从气流日志中获取
{gcp_dataflow_hook.py:116} INFO - Running command: python /Users/abc/sample.py
--runner=DataflowRunner -- project=project_id --region=region_name -
labels=airflow-version=v1-10-0 --job_name=google_dataflow-f8a478ae
数据流作业完成后
{gcp_dataflow_hook.py:128} WARNING - INFO:root:Job 2018-10-26_06_07_04-
17336980599969256162 is in state JOB_STATE_DONE
{gcp_api_base_hook.py:90} INFO - Getting connection using a JSON key file.
{discovery.py:866} INFO - URL being requested: GET
https://dataflow.googleapis.com/v1b3/projects/project_id/locations/us-
central1/jobs?alt=json
{gcp_dataflow_hook.py:77} INFO - Google Cloud DataFlow job not available yet..
不确定如何解决此问题,有人可以帮忙
控制台的数据流作业摘要
Job name beamapp-user-1026130638-681570
Job ID 2018-10-26_06_07_04-17336980599969256162
Region us-central1
Job statusSucceeded
SDK version Apache Beam SDK for Python 2.7.0