气流无法从数据流中获取成功状态

时间:2018-10-26 13:28:29

标签: python google-cloud-dataflow airflow

从airflow提交数据流作业时,它无法获取数据流作业的成功状态,并继续显示以下错误。

{gcp_dataflow_hook.py:77} INFO - Google Cloud DataFlow job not available yet..

气流损失

t2 = DataFlowPythonOperator(
task_id='google_dataflow',
py_file='/Users/abc/sample.py',
gcp_conn_id='connection_id',
dataflow_default_options={
    "project": 'Project_id'
    "runner": "DataflowRunner",
    "staging_location": 'gs://Project_id/staging',
    "temp_location": 'gs://Project_id/staging'
}
)

Sample.py

def run():
argv = [
        '--project={0}'.format(PROJECT),
        '--staging_location=gs://{0}/staging/'.format(BUCKET),
        '--temp_location=gs://{0}/staging/'.format(BUCKET),
        '--runner=DataflowRunner'
    ]

with beam.Pipeline(argv=argv) as p:
  (p | 'read_bq_table' >> beam.io.Read(beam.io.BigQuerySource(
        query = 'Select * from `ds.table` limit 10', 
        use_standard_sql=True))

if __name__ == '__main__':
    logging.getLogger().setLevel(logging.INFO)
    run()

已经在论坛上阅读了其他答案,并且根据建议,我从sample.py和Airflow dag中删除了工作名称,但是airflow仍无法获取成功返回代码。

当工作提交到数据流时,从气流日志中获取

{gcp_dataflow_hook.py:116} INFO - Running command:  python /Users/abc/sample.py 
--runner=DataflowRunner -- project=project_id --region=region_name - 
labels=airflow-version=v1-10-0 --job_name=google_dataflow-f8a478ae

数据流作业完成后

{gcp_dataflow_hook.py:128} WARNING - INFO:root:Job 2018-10-26_06_07_04- 
 17336980599969256162 is in state JOB_STATE_DONE
{gcp_api_base_hook.py:90} INFO - Getting connection using a JSON key file.
{discovery.py:866} INFO - URL being requested: GET 
https://dataflow.googleapis.com/v1b3/projects/project_id/locations/us- 
central1/jobs?alt=json
{gcp_dataflow_hook.py:77} INFO - Google Cloud DataFlow job not available yet..

不确定如何解决此问题,有人可以帮忙

控制台的数据流作业摘要

Job name beamapp-user-1026130638-681570
Job ID 2018-10-26_06_07_04-17336980599969256162
Region us-central1
Job statusSucceeded
SDK version Apache Beam SDK for Python 2.7.0

0 个答案:

没有答案