我一直在尝试运行一个调用bash进程的AWS数据管道,该bash进程从shell命令活动中调用了多个长时间运行的python和java进程。每次运行shell命令活动时,恰好5天后,任务运行器日志中都会引发reportProgress错误,并且该任务被取消。即使将tryTimeTimeout和LateAfterTimeout字段设置为超过5天,此问题仍然存在。 Task Runner日志消息和datapipeline json定义如下所示:
Screenshot of pipeline execution error
任务运行者日志消息:
01 Dec 2018 18:55:05,693 https://forums.aws.amazon.com/ (HeartBeatService-df-01341812NWJEQ1FAYI1K-@ShellCommandActivityId_UdTMC_2018-11-26T18:54:03_Attempt=1) amazonaws.datapipeline.taskrunner.HeartBeatService: HeartBeatService DataPipeline reportProgress error thrown and workCancelleddf-01341812NWJEQ1FAYI1K-@ShellCommandActivityId_UdTMC_2018-11-26T18:54:03_Attempt=1
amazonaws.datapipeline.taskrunner.CanceledTaskException: DataPipeline service requested this work be canceled.
at amazonaws.datapipeline.taskrunner.DataPipelineProgressReporter.reportProgress*(DataPipelineProgressReporter.java:31)
...
01 Dec 2018 18:55:06,726 https://forums.aws.amazon.com/ (TaskRunnerService-wg-10000-2) amazonaws.datapipeline.taskrunner.TaskPoller: Work ShellCommandActivity took 7201:0 to complete
PIPELINE JSON定义
{
"objects": [
{
"failureAndRerunMode": "CASCADE",
"resourceRole": "DataPipelineDefaultResourceRole",
"role": "DataPipelineDefaultRole",
"pipelineLogUri": "s3://oobhuntoo1/",
"scheduleType": "ONDEMAND",
"name": "Default",
"id": "Default"
},
{
"onLateAction": {
"ref": "ActionId_V6bq0"
},
"lateAfterTimeout": "7 Days",
"name": "DefaultShellCommandActivity1",
"id": "ShellCommandActivityId_UdTMC",
"workerGroup": "wg-10000",
"type": "ShellCommandActivity",
"command": "python ~/AWS_5day_Test/Python/Layer1.py"
},
{
"name": "DefaultAction1",
"id": "ActionId_V6bq0",
"type": "Terminate"
}
],
"parameters": []
}