如何在数据流上设置超时?

时间:2018-10-14 12:40:27

标签: python-2.7 google-cloud-dataflow google-cloud-composer

我正在使用Composer按计划运行我的Dataflow管道。如果这项工作占用了一定的时间,我希望将其杀死。有没有办法以编程方式将其作为管道选项或DAG参数来实现?

1 个答案:

答案 0 :(得分:1)

不确定如何将其用作管道配置选项,但这是一个主意。

您可以启动倒计时设置为超时值的任务队列任务。任务启动后,您可以检查任务是否仍在运行:

https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs/list

如果是,则可以使用作业状态JOB_STATE_CANCELLED

对其进行更新

https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs/update

https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs#jobstate

这是通过googleapiclient库:https://developers.google.com/api-client-library/python/apis/discovery/v1

完成的

这里是使用方法的示例

class DataFlowJobsListHandler(InterimAdminResourceHandler):

    def get(self, resource_id=None):
        """
        Wrapper to this:
        https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs/list
        """
        if resource_id:
            self.abort(405)
        else:
            credentials = GoogleCredentials.get_application_default()
            service = discovery.build('dataflow', 'v1b3', credentials=credentials)
            project_id = app_identity.get_application_id()
            _filter = self.request.GET.pop('filter', 'UNKNOWN').upper()

            jobs_list_request = service.projects().jobs().list(
                projectId=project_id,
                filter=_filter)  #'ACTIVE'
            jobs_list = jobs_list_request.execute()

            return {
                '$cursor': None,
                'results': jobs_list.get('jobs', []),
            }