AWS Fargate-实例启动时运行状况检查失败

时间:2019-11-24 21:43:12

标签: amazon-cloudformation aws-fargate

我对某些帖子有类似的问题,但据我所知,这些具体问题均无关。我将在本文的后面发布我的堆栈。

我有:

ALB----->Listener->target group->Fargate service->task definition
80/http           ->8080/http                   -> 8080/http

问题是我的健康检查失败。当Fargate任务启动一个实例时,我可以使用运行状况检查URL转到该实例,然后得到200响应。但是,任何尝试通过负载平衡器的操作都会导致网关超时。

$ curl -fv http://172.31.47.18:8080/healthz
*   Trying 172.31.47.18...
* TCP_NODELAY set
* Connected to 172.31.47.18 (172.31.47.18) port 8080 (#0)
> GET /healthz HTTP/1.1
> Host: 172.31.47.18:8080
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Transfer-Encoding: chunked
< Date: Sun, 24 Nov 2019 15:33:39 GMT
< Server: Warp/3.2.27
< 
* Connection #0 to host 172.31.47.18 left intact
OK

但是,健康检查永远不会通过LB。

  1. 现在用于所有事物的安全组是开放的。我想消除这一问题。
  2. 为公共IP设置了Fargate节点。

最近几天,这一直使我发疯。我站起了EC2支持的ECS,一切都在EC2上运行。我应该指出,整个堆栈在Fargate中的构建都很好,除了没有从负载均衡器或任何东西获得任何流量。

服务事件中的错误表明

service test-graph (port 8080) is unhealthy in target-group tg--test-graph due to (reason Request timed out).

希望有人有主意。

  TaskDef0:
    Type: AWS::ECS::TaskDefinition
    DependsOn: Cluster0
    Properties:
      ExecutionRoleArn: arn:aws:iam::xxxxx:role/ECS_Hasura_Execution_Role
      TaskRoleArn: arn:aws:iam::xxxxx:role/ecsTaskExecutionRole
      Family: !Ref 'ServiceName'
      Cpu: !FindInMap
                - ContainerSizeMap
                - !Ref ContainerSize
                - Cpu
      Memory: !FindInMap
                   - ContainerSizeMap
                   - !Ref ContainerSize
                   - Memory
      NetworkMode: awsvpc
      RequiresCompatibilities:
        - FARGATE
      ContainerDefinitions:
        - Name: !Ref 'ServiceName'
          Cpu: !FindInMap
                - ContainerSizeMap
                - !Ref ContainerSize
                - Cpu
          Memory: !FindInMap
                   - ContainerSizeMap
                   - !Ref ContainerSize
                   - Memory
          Image: !FindInMap
                - ServiceMap
                - !Ref ServiceProvider
                - ImageUrl
          PortMappings:
            - 
              ContainerPort: !Ref 'ContainerPort'
              HostPort: !Ref ContainerPort
              Protocol: tcp

  ALB0:
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    DependsOn: TaskDef0
    Properties: 
      Name: !Join
              - '-'
              - - lb-
                - !Ref ServiceName
      Scheme: internet-facing
      IpAddressType: ipv4
      LoadBalancerAttributes: 
        - Key: deletion_protection.enabled
          Value: false
        - Key: idle_timeout.timeout_seconds
          Value: 60
        - Key: routing.http.drop_invalid_header_fields.enabled
          Value: false
        - Key: routing.http2.enabled
          Value: true
      SecurityGroups: 
        - sg-xxxxxx # allow HTTP/HTTPS to the load balancer
      Subnets: 
        - subnet-111111
        - subnet-222222
        - subnet-333333
      Type: application

  targetGroup0:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    DependsOn: ALB0
    Properties: 
      Name: !Join
              - '-'
              - - tg-
                - !Ref ServiceName
      Port: !Ref TargetGroupPort
      Protocol: !Ref TargetGroupProtocol    
      TargetType: ip
      VpcId: !FindInMap
                - ServiceMap
                - !Ref ServiceProvider
                - VpcId
      # all other paraneters can be changed without interruption
      HealthCheckPort: traffic-port
      HealthCheckEnabled: !FindInMap
                - LBTGMap
                - Parameters
                - HealthCheckEnabled
      HealthCheckIntervalSeconds: !FindInMap
                - LBTGMap
                - Parameters
                - HealthCheckIntervalSeconds
      HealthCheckPath: !FindInMap
                - ServiceMap
                - !Ref ServiceProvider
                - HealthCheckPath
      HealthCheckProtocol: !FindInMap
                - ServiceMap
                - !Ref ServiceProvider
                - HealthCheckProtocol
      HealthCheckTimeoutSeconds: !FindInMap
                - LBTGMap
                - Parameters
                - HealthCheckTimeoutSeconds
      HealthyThresholdCount: !FindInMap
                - LBTGMap
                - Parameters
                - HealthyThresholdCount
      UnhealthyThresholdCount: !FindInMap
                - LBTGMap
                - Parameters
                - UnhealthyThresholdCount
      Matcher: 
        HttpCode: !FindInMap
                - ServiceMap
                - !Ref ServiceProvider
                - HealthCheckSuccessCode
      TargetGroupAttributes: 
        - Key: deregistration_delay.timeout_seconds
          Value: !FindInMap
                - LBTGMap
                - Parameters
                - DeregistrationDelay
        - Key: slow_start.duration_seconds
          Value: !FindInMap
                - LBTGMap
                - Parameters
                - SlowStart
        - Key: stickiness.enabled
          Value: !FindInMap
                - LBTGMap
                - Parameters
                - Stickiness

  Listener0:
    # This is the fixed response test listener
    Type: AWS::ElasticLoadBalancingV2::Listener
    DependsOn: ALB0
    Properties:   
      DefaultActions: 
        - Type: fixed-response      
          FixedResponseConfig: 
            ContentType: text/html
            MessageBody: <h1>Working</h1><p>The load balancer test listener is operational</p>
            StatusCode: 200
      LoadBalancerArn: !Ref ALB0
      Port: 9000
      Protocol: HTTP

  Listener1:
    # This is the port 80 listener
    Type: AWS::ElasticLoadBalancingV2::Listener
    DependsOn: ALB0
    Properties:   
      DefaultActions: 
        - Type: forward
          TargetGroupArn: !Ref targetGroup0
      LoadBalancerArn: !Ref ALB0
      Port: 80
      Protocol: HTTP

  Listener2:
    # This is the port 8080 listener
    Type: AWS::ElasticLoadBalancingV2::Listener
    DependsOn: ALB0
    Properties:   
      DefaultActions: 
        - Type: forward
          TargetGroupArn: !Ref targetGroup0
      LoadBalancerArn: !Ref ALB0
      Port: 8080
      Protocol: HTTP

  Listener3 :
    # This is the port 443 listener
    Type: AWS::ElasticLoadBalancingV2::Listener
    DependsOn: ALB0
    Properties:   
      Certificates:
        - CertificateArn: !FindInMap
                - CertificateMap
                - !Ref AWS::Region
                - CertifcateArn  
      DefaultActions: 
        - Type: forward
          TargetGroupArn: !Ref targetGroup0
      LoadBalancerArn: !Ref ALB0
      Port: 443
      Protocol: HTTPS

  Service0:
    Type: AWS::ECS::Service
    DependsOn: Listener2
    Properties:
      ServiceName: !Ref 'ServiceName'
      Cluster: !Ref Cluster0
      LaunchType: FARGATE
      DeploymentConfiguration:
        MaximumPercent: !FindInMap
                - ECSServiceMap
                - Parameters
                - MaximumPercent
        MinimumHealthyPercent: !FindInMap
                - ECSServiceMap
                - Parameters
                - MinimumHealthyPercent
      DesiredCount: !Ref 'DesiredTaskCount'
      NetworkConfiguration:
        AwsvpcConfiguration:
          AssignPublicIp: ENABLED
          SecurityGroups: # this is allow all ports and IPs
            - !FindInMap
                - SecurityGroupMap
                - !Ref AWS::Region
                - sg0
          Subnets:
            - !FindInMap
                - SubnetMap
                - !Ref AWS::Region
                - subnet0
            - !FindInMap
                - SubnetMap
                - !Ref AWS::Region
                - subnet1
            - !FindInMap
                - SubnetMap
                - !Ref AWS::Region
                - subnet2
      TaskDefinition: !Ref 'TaskDef0'
      LoadBalancers:
        - ContainerName: !Ref 'ServiceName'
          ContainerPort: !Ref 'ContainerPort'
          TargetGroupArn: !Ref 'targetGroup0'
      Tags: 
        - Key: Application
          Value: !Ref "Application"
        - Key: Customer
          Value: !Ref "Customer"
        - Key: Role
          Value: !Ref "Role"
        - Key: InternetAccessible
          Value: !Ref "InternetAccessible"
        - Key: CreationDate
          Value: !Ref "CreationDate"
        - Key: CreatedBy
          Value: !Ref "CreatedBy"

Mappings:
  ServiceMap:
    GraphQL-Ohio: 
      ImageUrl: xxxxx.dkr.ecr.us-east-2.amazonaws.com/hasura/graphql-engine
      HealthCheckPath: /healthz
      HealthCheckSuccessCode: 200
      HealthCheckProtocol: HTTP
      VpcId: vpc-xxxxx

  LBTGMap:
    Parameters:
      HealthCheckEnabled: True
      HealthCheckIntervalSeconds: 30
      HealthCheckTimeoutSeconds: 5
      HealthyThresholdCount: 5
      UnhealthyThresholdCount: 2
      DeregistrationDelay: 300
      SlowStart: 0
      Stickiness: false

  SubnetMap: # There is technical debt here to keep this up to date as subnets change
    us-east-2:
      subnet0: subnet-111111
      subnet1: subnet-222222
      subnet2: subnet-333333

  SecurityGroupMap: 
    us-east-2: 
      sg0: sg-xxxxx

1 个答案:

答案 0 :(得分:0)

好的-我知道了。我将HealthCheckPort设置为traffic-port。字符串文字“ traffic-port”,而不是实际的端口号。 h