通过cloudformation更新时,ECS任务卡在PENDING中

时间:2020-06-12 03:44:00

标签: amazon-web-services amazon-cloudformation amazon-iam amazon-ecs

在构建正常但在cloudformation中更新任务时,我在部署ECS群集时遇到问题。 ECSSerivce启动了6个PENDING新任务。但是仍然有6个旧任务RUNNING,有时它将开始耗尽旧任务,并且部署将工作,但是有时所有旧任务都不会耗尽,而ECSService只会停留在UPDATE_IN_PROGRESS中。我该如何困扰这样的事情?

下面是我的堆栈模板。

AWSTemplateFormatVersion: '2010-09-09'
Resources:
  ElasticLoadBalancer:
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    Properties:
      SecurityGroups:
      - !Ref 'ELBSecurityGroup'
      Subnets:
      - !Ref 'InstanceSubnet'
      - !Ref 'SecondarySubnet'
      Scheme: internet-facing
  RedirectLoadBalancerListener:
    Type: AWS::ElasticLoadBalancingV2::Listener
    DependsOn: ECSServiceRole
    Properties:
      DefaultActions:
      - Type: forward
        TargetGroupArn: !Ref 'ECSTG'
      LoadBalancerArn: !Ref 'ElasticLoadBalancer'
      Port: '80'
      Protocol: HTTP
  RedirectLoadBalancerListenerRule:
    Type: AWS::ElasticLoadBalancingV2::ListenerRule
    DependsOn: RedirectLoadBalancerListener
    Properties:
      Actions:
      - Type: forward
        TargetGroupArn: !Ref 'ECSTG'
      Conditions:
      - Field: path-pattern
        Values:
        - /
      ListenerArn: !Ref 'RedirectLoadBalancerListener'
      Priority: '1'
  LoadBalancerListener:
    Type: AWS::ElasticLoadBalancingV2::Listener
    DependsOn: ECSServiceRole
    Properties:
      Certificates:
      - CertificateArn: !Ref 'SSLCertificateId'
      DefaultActions:
      - Type: forward
        TargetGroupArn: !Ref 'ECSTG'
      LoadBalancerArn: !Ref 'ElasticLoadBalancer'
      Port: '443'
      Protocol: HTTPS
  LoadBalancerListenerRule:
    Type: AWS::ElasticLoadBalancingV2::ListenerRule
    DependsOn: LoadBalancerListener
    Properties:
      Actions:
      - Type: forward
        TargetGroupArn: !Ref 'ECSTG'
      Conditions:
      - Field: path-pattern
        Values:
        - /
      ListenerArn: !Ref 'LoadBalancerListener'
      Priority: '1'
  ECSTG:
    DependsOn: ElasticLoadBalancer
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      HealthCheckIntervalSeconds: 6
      HealthCheckPath: /api/ping
      HealthCheckProtocol: HTTP
      HealthCheckTimeoutSeconds: 5
      HealthyThresholdCount: 2
      Port: 80
      Protocol: HTTP
      UnhealthyThresholdCount: 5
      VpcId: !Ref 'VPCId'
      TargetGroupAttributes:
      - Key: deregistration_delay.timeout_seconds
        Value: '20'
  AppSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: AppSecurityGroup
      SecurityGroupIngress:
      - IpProtocol: '-1'
        FromPort: '-1'
        ToPort: '-1'
        SourceSecurityGroupId: !Ref 'ELBSecurityGroup'
      VpcId: !Ref 'VPCId'
  Route53Entry:
    Type: AWS::Route53::RecordSetGroup
    Properties:
      HostedZoneName: !Join ['', [!Ref 'Route53HostedZone', .]]
      Comment: Zone apex alias targeted to myELB LoadBalancer.
      RecordSets:
      - Name: !Join [., [!Ref 'ApplicationHost', !Ref 'Route53HostedZone']]
        Type: A
        AliasTarget:
          HostedZoneId: !GetAtt [ElasticLoadBalancer, CanonicalHostedZoneID]
          DNSName: !GetAtt [ElasticLoadBalancer, DNSName]
  ELBSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: ELBSecurityGroup
      SecurityGroupIngress:
      - IpProtocol: tcp
        FromPort: '443'
        ToPort: '443'
        CidrIp: 0.0.0.0/0
      - IpProtocol: tcp
        FromPort: '80'
        ToPort: '80'
        CidrIp: 0.0.0.0/0
      VpcId: !Ref 'VPCId'
  CloudWatchAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      ActionsEnabled: true
      AlarmActions:
      - arn:aws:sns:us-east-1:6xxxxxxx:instance-alarm
      ComparisonOperator: LessThanOrEqualToThreshold
      Dimensions:
      - Name: LoadBalancer
        Value: !GetAtt [ElasticLoadBalancer, LoadBalancerFullName]
      - Name: TargetGroup
        Value: !GetAtt [ECSTG, TargetGroupFullName]
      EvaluationPeriods: 5
      MetricName: HealthyHostCount
      Namespace: AWS/ApplicationELB
      Period: 60
      Statistic: Maximum
      Threshold: 0
  LowOnCreditAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      ActionsEnabled: true
      AlarmActions:
      - arn:aws:sns:us-east-1:6xxxxxx:instance-alarm
      ComparisonOperator: LessThanThreshold
      Dimensions:
      - Name: AutoScalingGroupName
        Value: !Ref 'AutoScalingGroup'
      EvaluationPeriods: 1
      MetricName: CPUCreditBalance
      Namespace: AWS/EC2
      Period: 300
      Statistic: Average
      Threshold: 15
  Database:
    Type: AWS::RDS::DBInstance
    Properties:
      AllocatedStorage: '5'
      DBInstanceClass: db.t2.micro
      Engine: postgres
      BackupRetentionPeriod: 35
      EngineVersion: 9.5.2
      DBName: !If [RestoreDB, '', ekdb]
      MasterUsername: !Ref 'DBUser'
      MasterUserPassword: !Ref 'DBPassword'
      DBSecurityGroups:
      - !Ref 'DatabaseSecurityGroup'
      DBSubnetGroupName: !Ref 'DatabaseSubnetGroup'
      DBSnapshotIdentifier: !Ref 'DBSnapshot'
    DeletionPolicy: Snapshot
  DatabaseSecurityGroup:
    Type: AWS::RDS::DBSecurityGroup
    Properties:
      GroupDescription: DatabaseSecurityGroup
      DBSecurityGroupIngress:
      - EC2SecurityGroupId: !Ref 'AppSecurityGroup'
      EC2VpcId: !Ref 'VPCId'
  Redis:
    Type: AWS::ElastiCache::CacheCluster
    Properties:
      CacheNodeType: cache.t2.micro
      Engine: redis
      EngineVersion: 2.8.24
      NumCacheNodes: 1
      VpcSecurityGroupIds:
      - !Ref 'RedisSecurityGroup'
      CacheSubnetGroupName: !Ref 'RedisSubnetGroup'
  RedisSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: RedisSecurityGroup
      SecurityGroupIngress:
      - IpProtocol: tcp
        FromPort: '6379'
        ToPort: '6379'
        SourceSecurityGroupId: !Ref 'AppSecurityGroup'
      VpcId: !Ref 'VPCId'
  FrontendUser:
    Type: AWS::IAM::User
    Properties:
      Groups:
      - SynapseAppUsers
  BackendUser:
    Type: AWS::IAM::User
    Properties:
      Groups:
      - SynapseAppUsers
  FrontendUserAccessKey:
    Type: AWS::IAM::AccessKey
    Properties:
      UserName: !Ref 'FrontendUser'
  BackendUserAccessKey:
    Type: AWS::IAM::AccessKey
    Properties:
      UserName: !Ref 'BackendUser'
  S3BucketPolicy:
    Type: AWS::S3::BucketPolicy
    Properties:
      Bucket: !Ref 'S3Bucket'
      PolicyDocument:
        Statement:
        - Action: s3:GetObject
          Effect: Allow
          Resource: !Sub 'arn:aws:s3:::${S3Bucket}/*'
          Principal:
            AWS:
            - !GetAtt 'FrontendUser.Arn'
            - !GetAtt 'BackendUser.Arn'
        - Action: s3:PutObject
          Effect: Allow
          Resource: !Sub 'arn:aws:s3:::${S3Bucket}/*'
          Principal:
            AWS:
            - !GetAtt 'BackendUser.Arn'
        - Action: s3:PutObjectAcl
          Effect: Allow
          Resource: !Sub 'arn:aws:s3:::${S3Bucket}/*'
          Principal:
            AWS:
            - !GetAtt 'BackendUser.Arn'
        - Action:
          - s3:PutObjectAcl
          - s3:PutObject
          - s3:GetObject
          - s3:DeleteObject
          Effect: Allow
          Resource: !Sub 'arn:aws:s3:::${S3Bucket}/*'
          Principal:
            AWS:
            - arn:aws:iam::6xxxxxxx:user/filestack-v3-policy
  S3Bucket:
    Type: AWS::S3::Bucket
    Properties:
      AccessControl: AuthenticatedRead
      CorsConfiguration:
        CorsRules:
        - AllowedHeaders:
          - '*'
          AllowedMethods:
          - GET
          - PUT
          - POST
          AllowedOrigins:
          - '*'
          ExposedHeaders:
          - ETag
          MaxAge: 3000
    DeletionPolicy: Retain
  AppIamRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - ec2.amazonaws.com
          Action:
          - sts:AssumeRole
      Path: /
      Policies:
      - PolicyName: app-iam-role
        PolicyDocument:
          Statement:
          - Effect: Allow
            Action:
            - ecs:*
            - ecr:*
            - sns:*
            - logs:*
            Resource: '*'
          - Effect: Allow
            Action:
            - s3:PutObject
            - s3:GetObject
            - s3:PutObjectAcl
            - s3:DeleteObject
            Resource: !GetAtt [S3Bucket, Arn]
  AppInstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      Path: /
      Roles:
      - !Ref 'AppIamRole'
  LaunchConfig:
    Type: AWS::AutoScaling::LaunchConfiguration
    Properties:
      AssociatePublicIpAddress: true
      ImageId: !FindInMap [AWSRegionToAMI, !Ref 'AWS::Region', AMIID]
      InstanceType: !If [IsExclusive, t2.medium, m4.large]
      IamInstanceProfile: !Ref 'AppInstanceProfile'
      SecurityGroups:
      - !Ref 'AppSecurityGroup'
      UserData: !Base64
        Fn::Join:
        - ''
        - - '#!/bin/bash -xe

            '
          - echo ECS_CLUSTER=
          - !Ref 'ECSCluster'
          - ' >> /etc/ecs/ecs.config

            '
          - 'yum install -y aws-cfn-bootstrap

            '
          - '/opt/aws/bin/cfn-signal -e $? '
          - '         --stack '
          - !Ref 'AWS::StackName'
          - '         --resource AutoScalingGroup '
          - '         --region '
          - !Ref 'AWS::Region'
  AutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      LaunchConfigurationName: !Ref 'LaunchConfig'
      MinSize: 1
      MaxSize: 2
      DesiredCapacity: !If [IsExclusive, 1, 2]
      VPCZoneIdentifier:
      - !Ref 'InstanceSubnet'
      HealthCheckGracePeriod: 600
      HealthCheckType: ELB
    CreationPolicy:
      ResourceSignal:
        Timeout: PT15M
    UpdatePolicy:
      AutoScalingReplacingUpdate:
        WillReplace: 'true'
  DatabaseSubnetGroup:
    Type: AWS::RDS::DBSubnetGroup
    Properties:
      DBSubnetGroupDescription: Subnet Group for database
      SubnetIds:
      - !Ref 'SecondarySubnet'
      - !Ref 'InstanceSubnet'
  RedisSubnetGroup:
    Type: AWS::ElastiCache::SubnetGroup
    Properties:
      Description: Subnet Group for Redis
      SubnetIds:
      - !Ref 'SecondarySubnet'
      - !Ref 'InstanceSubnet'
  ECSCluster:
    Type: AWS::ECS::Cluster
  ECSService:
    DependsOn:
    - RedirectLoadBalancerListener
    - LoadBalancerListener
    - AutoScalingGroup
    Type: AWS::ECS::Service
    Properties:
      Cluster: !Ref 'ECSCluster'
      DesiredCount: !If [IsExclusive, 2, 6]
      Role: !Ref 'ECSServiceRole'
      TaskDefinition: !Ref 'TaskDefinition'
      LoadBalancers:
      - ContainerName: nginx
        ContainerPort: '80'
        TargetGroupArn: !Ref 'ECSTG'
  ECSServiceRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - ecs.amazonaws.com
          Action:
          - sts:AssumeRole
      Path: /
      Policies:
      - PolicyName: ecs-service
        PolicyDocument:
          Statement:
          - Effect: Allow
            Action:
            - elasticloadbalancing:DeregisterInstancesFromLoadBalancer
            - elasticloadbalancing:DeregisterTargets
            - elasticloadbalancing:Describe*
            - elasticloadbalancing:RegisterInstancesWithLoadBalancer
            - elasticloadbalancing:RegisterTargets
            - ec2:Describe*
            - ec2:AuthorizeSecurityGroupIngress
            Resource: '*'
  TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      ContainerDefinitions:
      - Name: frontend
        Memory: '256'
        MemoryReservation: '32'
        Image: !Sub '6xxxxxxx0.dkr.ecr.us-east-1.amazonaws.com/frontend:${ImageTag}'
        LogConfiguration:
          LogDriver: awslogs
          Options:
            awslogs-group: !Ref 'ECSLogGroup'
            awslogs-region: !Ref 'AWS::Region'
            awslogs-stream-prefix: '[frontend]'
      - Name: backend
        Memory: '1024'
        MemoryReservation: '256'
        Links:
        - xray-daemon
        Environment:
        - Name: NODE_ENV
          Value: prod
        - Name: AWS_XRAY_DAEMON_ADDRESS
          Value: "xray-daemon:2000"
        - Name: APPLICATION_URL
          Value: !Sub 'https://${ApplicationHost}.${Route53HostedZone}'
        - Name: ACCOUNTS_TOKEN
          Value: !Ref AccountsToken
        - Name: ACCOUNTS_URL
          Value: !Ref 'AccountsUrl'
        - Name: HEAP_APPLICATION_ID
          Value: '3901275559'
        - Name: HUBSPOT_API_KEY
          Value: !Ref 'HubspotApiKey'
        - Name: USER_POOL
          Value: !Ref 'UserPool'
        - Name: POOL_CLIENTS
          Value: !Ref 'PoolClients'
        - Name: JWKS
          Value: !Ref 'JWKS'
        - Name: DATABASE_URL
          Value: !Sub ['postgresql://${DBUser}:${DBPassword}@${Address}:${Port}/ekdb',
            {Address: !GetAtt [Database, Endpoint.Address], Port: !GetAtt [Database,
                Endpoint.Port]}]
        - Name: REDIS_URL
          Value: !Sub ['redis://${Address}:${Port}/', {Address: !GetAtt [Redis, RedisEndpoint.Address],
              Port: !GetAtt [Redis, RedisEndpoint.Port]}]
        - Name: S3_FRONTEND_USER_ACCESS_KEY_ID
          Value: !Ref 'FrontendUserAccessKey'
        - Name: S3_FRONTEND_USER_SECRET
          Value: !GetAtt [FrontendUserAccessKey, SecretAccessKey]
        - Name: S3_BACKEND_USER_ACCESS_KEY_ID
          Value: !Ref 'BackendUserAccessKey'
        - Name: S3_BACKEND_USER_SECRET
          Value: !GetAtt [BackendUserAccessKey, SecretAccessKey]
        - Name: S3_BUCKET_NAME
          Value: !Ref 'S3Bucket'
        - Name: UPLOAD_STRATEGY
          Value: S3
        - Name: ACCOUNT_ID
          Value: !Ref 'AccountId'
        - Name: CHECK_ACCOUNT_ID
          Value: !Ref 'CheckAccountId'
        - Name: SNS_TOPIC_ARN
          Value: !Ref 'SNSTopicArn'
        Image: !Sub '6xxxxxx.dkr.ecr.us-east-1.amazonaws.com/backend:${ImageTag}'
        LogConfiguration:
          LogDriver: awslogs
          Options:
            awslogs-group: !Ref 'ECSLogGroup'
            awslogs-region: !Ref 'AWS::Region'
            awslogs-stream-prefix: '[backend]'
      - Name: nginx
        Memory: '256'
        MemoryReservation: '32'
        Links:
        - frontend
        - backend
        - pdf_viewer
        - preview
        Image: !Sub '67xxxxxx.dkr.ecr.us-east-1.amazonaws.com/nginx:${ImageTag}'
        LogConfiguration:
          LogDriver: awslogs
          Options:
            awslogs-group: !Ref 'ECSLogGroup'
            awslogs-region: !Ref 'AWS::Region'
            awslogs-stream-prefix: '[nginx]'
        PortMappings:
        - ContainerPort: 80
      - Name: pdf_viewer
        Memory: '256'
        MemoryReservation: '32'
        Image: !Sub '6xxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/pdf_viewer:${ImageTag}'
        LogConfiguration:
          LogDriver: awslogs
          Options:
            awslogs-group: !Ref 'ECSLogGroup'
            awslogs-region: !Ref 'AWS::Region'
            awslogs-stream-prefix: '[pdf_viewer]'
      - Name: preview
        Memory: '256'
        MemoryReservation: '32'
        Image: !Sub '6xxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/preview:${ImageTag}'
        LogConfiguration:
          LogDriver: awslogs
          Options:
            awslogs-group: !Ref 'ECSLogGroup'
            awslogs-region: !Ref 'AWS::Region'
            awslogs-stream-prefix: '[preview]'
      - Name: xray-daemon
        Memory: '256'
        MemoryReservation: '32'
        Image: 'amazon/aws-xray-daemon'
        PortMappings:
        - ContainerPort: 2000
          HostPort: 0
          Protocol: "udp"
        LogConfiguration:
          LogDriver: awslogs
          Options:
            awslogs-group: !Ref 'ECSLogGroup'
            awslogs-region: !Ref 'AWS::Region'
            awslogs-stream-prefix: '[xray-daemon]'
  ECSLogGroup:
    Type: AWS::Logs::LogGroup
Parameters:
  CheckAccountId:
    Type: String
    Description: Should user's account id be checked while logging in to the instance?
    Default: 'yes'
  Route53HostedZone:
    Type: String
  SSLCertificateId:
    Type: String
    Description: Pass SSL id from AWS Certificate Manager to pass to ELB
  ApplicationHost:
    Type: String
    Description: 'Host to be applied as follows: {host}.{Route53HostedZone}'
  DBUser:
    Type: String
    Description: Username that the database should be accessible with
  DBPassword:
    Type: String
    Description: Password that the database user should have
  HtpasswdEntry:
    Type: String
    Description: This is the file that should be htpasswd entry file
  DBSnapshot:
    Type: String
    Description: Database Snapshot ID if you want to restore DB from snapshot
    Default: ''
  VPCId:
    Type: String
    Description: VPC Id to assosiate instance to. Pass this if you want to hide the
      instances behind pre-existing VPC
    Default: vpc-355a6b51
  InstanceSubnet:
    Type: String
    Description: Subnet on which the instance should be set up. Required if VPCId
      is set
    Default: subnet-beb826c8
  SecondarySubnet:
    Type: String
    Description: Subnet on which the RDS and ElastiCache group will be set up as well.
      Required if VPCId is set
    Default: subnet-04e39239
  AccountId:
    Type: String
    Description: AccountId. used to filter out users from Auth0
  AccountsUrl:
    Type: String
    Description: Accounts url eg. https://app.getsynapse.com/
  SNSTopicArn:
    Type: String
    Description: ARN of SNS Topic that will be use to communicate between different
      parts of the infrastructure
  HubspotApiKey:
    Type: String
    Description: Hubspot api key
  UserPool:
    Type: String
    Description: Cognito UserPool
  PoolClients:
    Type: String
    Description: Cognito PoolClients
  JWKS:
    Type: String
    Description: Cognito JWKS
  ImageTag:
    Type: String
    Description: Tag of docker images
  AccountsToken:
    Type: String
    Description: Token used for authenticating with Accounts
Conditions:
  RestoreDB: !Not [!Equals [!Ref 'DBSnapshot', '']]
  IsExclusive: !Not [!Equals [!Ref 'AccountId', N/a]]
Outputs:
  InstanceURL:
    Value: !Join ['', ["https://", !Ref 'ApplicationHost', ., !Ref 'Route53HostedZone']]
Mappings:
  AWSRegionToAMI:
    us-east-1:
      AMIID: ami-a7a242da
    us-east-2:
      AMIID: ami-b86a5ddd
    us-west-1:
      AMIID: none
    us-west-2:
      AMIID: none
    eu-west-1:
      AMIID: none
    eu-central-1:
      AMIID: none
    ap-northeast-1:
      AMIID: none
    ap-southeast-1:
      AMIID: none
    ap-southeast-2:
      AMIID: none

1 个答案:

答案 0 :(得分:1)

根据评论,该问题似乎与MaximumPercentMinimumHealthyPercent参数及其默认值200和100 有关:

  • MaximumPercent:如果服务使用的是滚动更新(ECS)部署类型,则 maximum percent parameter 表示服务中允许的任务数量上限部署期间处于“正在运行”或“正在挂起”状态。

  • MinimumHealthyPercent:如果服务正在使用滚动更新(ECS)部署类型,则最低健康百分比表示必须保留在服务中的服务中任务数的下限部署过程中的“运行中”状态。

默认值200和100表示​​,对于大小为6个任务的服务,在部署期间,将有 12个任务在运行。对于容器实例而言,这似乎太多了。

建议的解决方案是将值更改为 150和50 ,从而导致在部署过程中总共运行 6个任务(新的3个,旧的3个),直到部署完成。