在Amazon Data Pipeline中,如何确保只有一个管道实例随时运行?

时间:2018-03-26 22:50:44

标签: amazon-web-services etl amazon-data-pipeline

我有一个包含两个任务的管道。任务2取决于任务1,并且两个任务的maxActiveInstances都设置为1。尽管存在这种依赖关系,但在某些情况下,任务2与任务1同时运行。例如,如果任务2花费太长时间并且达到管道下次执行的预定开始时间,则任务1同时开始运行。回填时也会发生同样的事情。

由于这两个任务相互干扰,我不希望它们在任何情况下同时运行。理想情况下,我只想要一次运行管道实例(而不是单个任务)。但我无法弄清楚如何做到这一点。

以下是使用...取代不感兴趣的部分时管道的样子:

{
  "objects": [
    {
      "period": "15 Minutes",
      "name": "Every 15 minutes",
      "id": "DefaultSchedule",
      "type": "Schedule",
      "startAt": "FIRST_ACTIVATION_DATE_TIME"
    },
    {
      "failureAndRerunMode": "CASCADE",
      "resourceRole": "...",
      "role": "...",
      "pipelineLogUri": "...",
      "scheduleType": "cron",
      "schedule": {
        "ref": "DefaultSchedule"
      },
      "maxActiveInstances": "1",
      "name": "Default",
      "id": "Default"
    },
    {
      "name": "CopyTablesActivity",
      "id": "CopyTablesActivity",
      "workerGroup": "dp01",
      "type": "ShellCommandActivity",
      "command": "..."
    },
    {
      "name": "CreateReportsActivity",
      "id": "CreateReportsActivity",
      "workerGroup": "dp01",
      "type": "ShellCommandActivity",
      "command": "...",
      "dependsOn": {
        "ref": "CopyTablesActivity"
      }
    }
  ],
  "parameters": [...]
}

1 个答案:

答案 0 :(得分:0)

在CopyTablesActivity上,您可以将lateAfterTimeout属性设置为5分钟左右,然后添加名为onLateAction的属性,并将其设置为终止。我们的想法是,如果5分钟后CopyTablesActivity没有完成,则终止管道。例如,CopyTablesActivity对象可能如下所示:

{ "name": "CopyTablesActivity", "id": "CopyTablesActivity", "workerGroup": "dp01", "lateAfterTimeout" : "5 minutes", "type": "ShellCommandActivity", "onLateAction" : { "ref" : "DefaultAction1" } "command": "..." }

然后,你可以这样定义DefaultAction1:

{ "name" : "TerminateTasks", "id" : "DefaultAction1", "type" : "Terminate" }

有关详细信息,请参阅此链接:https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-terminate.html