微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Cloudformation ECS Fargate 自动扩展目标跟踪:1 分钟内 1 个自定义警报:无法执行操作

如何解决Cloudformation ECS Fargate 自动扩展目标跟踪:1 分钟内 1 个自定义警报:无法执行操作

我可以通过控制台设置以下内容,并将其用作 cloudformation 模板:

  • 与我的 ALB 关联的可扩展目标,
  • cpu 目标跟踪扩展策略,
  • ALBRequestCountPerTarget 目标跟踪政策。

这一切都很好。在我的 Cloudformation 模板中创建策略还负责创建关联的横向扩展和缩减警报。

问题:自动创建的警报仅在前 3 个 60 秒的时间段内发生 3 个警报后才会触发。因此,如果突然负载进来,ECS 集群的服务需要 3 分钟才能横向扩展。这对我来说太长了。我希望它尽可能快地扩展。而且,从文档来看,ALB RequestCountPerTarget 的最小周期似乎是 60 秒:

“AWS/”中的指标仅支持大于 60 秒的时间段 命名空间

手动解决方案:现在,我可以进入控制台,在 cloudwatch 服务中,找到为我创建的 HIGH 和 LOW 警报,并编辑 HIGH 警报(触发向外扩展的警报)。所以我可以将警报的评估“周期”设置为 60 秒,“DatapointsToAlarm”设置为 1(一旦警报响起,触发向外扩展操作),“EvaluationPeriods”设置为 1(仅考虑前 60 秒的时间段) ,并将“阈值”设置为 500(如果过去 60 秒内我的 ALB 上有超过 500 个请求,则添加容量=向外扩展)。

为了测试,我使用 JMeter 并发送了大量请求,我可以看到警报在一分钟左右响起,并且我的 ECS 服务调整了所需的运行任务计数。这一切都很好。

但现在,我们都应该编写基础设施即代码 (IaC),对吗?因此,我希望将上述控制台调整包含在我的 CloudFormation 模板中。这就是问题发生的地方。我做了什么:

  • 我在 Cloudformation 模板中添加了两个新警报:一个用于 HIGH(向外扩展),另一个用于 LOW(向内扩展),
  • 这两个新警报指向现有的扩展策略。

发生了什么:

  • 警报会在 60 秒内进入警报状态,
  • 策略尝试运行操作(向外扩展),但出现错误

执行操作失败
arn:aws:autoscaling:us-east-1:MY-ACCOUNT-ID:scalingPolicy:51f0a780-28d5-4005-9681-84244912954d:resource/ecs/service/my-ecs-cluster/my-ecs-service:policyName/ alb-requests-per-target-per-minute。 收到错误:“”

我不知道这意味着什么,只能告诉我:警报已正确发出,我们尝试对其采取行动并向外扩展,但它失败了(并且没有提供任何错误!)。

我尝试将此错误与其他成功操作进行比较 [AWS 在创建自动缩放策略时自动创建的警报中的操作 = 在 t=3 分钟触发的操作],我看到的唯一区别是错误消息中操作的 ARN 似乎缺少“createdBy”,当“认”警报(在 cloudformation 创建自动缩放策略时自动提供的警报)触发操作时,该“createdBy”似乎附加到操作 ARN:>

成功执行的动作 arn:aws:autoscaling:us-east-1:MY-ACCOUNT-ID:scalingPolicy:51f0a780-28d5-4005-9681-84244912954d:resource/ecs/service/my-ecs-cluster-cce/my-ecs-service: policyName/tpg-cce-cpu-target-tracking-scaling-policy:createdBy/59b3e5ac-81ae-490f-8ecb-00241506a15e

成功执行的动作 arn:aws:autoscaling:us-east-1:MY-ACCOUNT-ID:scalingPolicy:51f0a780-28d5-4005-9681-84244912954d:resource/ecs/service/my-ecs-cluster-cce/my-ecs-service: policyName/alb-requests-per-target-per-minute:createdBy/ffacb0ac-2456-4751-b9c0-b909c66e9868

注意上面的区别(策略 ARN 中缺少 createdBy,其中操作由我的自定义警报触发)。但我不知道如何得到它,因为在 CloudFormation 中,我 Fn::Ref 到策略 ARN,没有提到在策略 ARN 的末尾附加某种“createdBy”(请注意,这可能不是根本就是这个问题,我只是列出了我迄今为止发现的内容,这是我迄今为止发现的唯一区别 = 这可能是一条红鲱鱼/虚假线索)。

一个线索,也许是,当我去 Cloudwatch 中的 AWS 控制台查看警报时:

  • 我可以编辑 AWS 在我创建策略时自动创建的 Cloudwatch HIGH 警报,
  • 我无法编辑我的自定义“横向扩展”警报(下方 CloudFormation 模板底部的警报)。我尝试编辑自定义警报时控制台中的错误是:

无法编辑 myCloudFormationStack-ALBRequestsScaleOutAlarm-E4VY9ZOJ5DOF 原样 具有目标跟踪扩展策略的 Auto Scaling 警报。

我的警报与使用策略自动创建的警报之间的一个区别是:当我在 AWS 控制台中查看 CloudWatch->Alarms 并查看警报详细信息时,“操作”部分看起来不同。对于自动配置的警报,我看到了:

报警时,执行此操作 arn:aws:autoscaling:us-east-1:MYACCOUNTID:scalingPolicy:51f0a780-28d5-4005-9681-84244912954d:resource/ecs/service/my-ecs-cluster-cce/ my-service:policyName/alb-requests-per-target-per-minute:createdBy/ffacb0ac-2456-4751-b9c0-b909c66e9868

但是对于我自己的警报(在下面的 CloudFormation 模板中定义),我在警报详细信息(操作)中看到了这一点:

报警时,使用策略 alb-requests-per-target-per-minute(将指标 ALBRequestCountPerTarget 保持在目标值 1000。)

这是我的完整 CloudFormation 模板:

AWstemplateFormatVersion: '2010-09-09'
Description: ECS task deFinition,service,and hooks it up to the ALB via a Target Group

# IMPORTANT: this needs the first Cloudformation layers in place (see the imports below)

Parameters:
  ContainerImageIdParam:
    Description: The ECR container image ID and tag to deploy
    Type: String
    Default: MYACCOUNTID.dkr.ecr.us-east-1.amazonaws.com/myapp:v10

  JDBCUrlParam:
    Description: The JDBC URL to the RDS database (use the Route53 DNS entry to your database,and NOT the AWS URL)
    Type: String
    Default: jdbc-secretsmanager:MysqL://MysqL.MYPIERRednS.org:3306/MYDATABASE?useSSL=false&allowPublicKeyRetrieval=true&serverTimezone=UTC

Resources:
  Task:
    Type: AWS::ECS::TaskDeFinition
    Properties:
      Family: myapp
      cpu: 512
      Memory: 1024
      NetworkMode: awsvpc
      RequiresCompatibilities:
        - FARGATE
      ExecutionRoleArn: !ImportValue ECSTaskExecutionRole
      taskRoleArn: !ImportValue ECSTaskRole
      ContainerDeFinitions:
        - Name: myapp-container
          Image: !Ref ContainerImageIdParam
          cpu: 512
          Memory: 1024
          environment:
            - name: JDBC_DB_URL
              value: !Ref JDBCUrlParam
            - name: JDBC_DB_DRIVER_CLASS
              value: com.amazonaws.secretsmanager.sql.AWSSecretsManagerMysqLDriver
            - name: JDBC_DB_USERNAME
              value: dev/myapp/MysqL
            - name: JDBC_DB_PASSWORD
              value: notUsedButECSWillErrorIfMissing
            - name: DB_NUM_THREADS
              value: 10
          PortMappings:
            - ContainerPort: 9000
              Protocol: tcp
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-group: /ecs/myapp
              awslogs-region: !Ref AWS::Region
              awslogs-stream-prefix: myapp-app

  Service:
    Type: AWS::ECS::Service
    DependsOn: ListenerRule
    Properties:
      ServiceName: myapp-service # todo if we remove the name,one will be automatically be generated
      TaskDeFinition: !Ref Task
      Cluster: !ImportValue ECSCluster
      LaunchType: FARGATE
      DesiredCount: 1 # set this to 0 if cloudformation has issues creating this stack (otherwise takes 3 hours and then fails/timeouts)
      DeploymentConfiguration:
        MaximumPercent: 200
        MinimumHealthyPercent: 0 
      HealthCheckGracePeriodSeconds: 30
      NetworkConfiguration:
        AwsvpcConfiguration:
          AssignPublicIp: ENABLED
          subnets:
            - !ImportValue Privatesubnet1
            - !ImportValue Privatesubnet2
          SecurityGroups:
            - !ImportValue ECSServiceSecurityGroup
      LoadBalancers:
        - ContainerName: myapp-container
          ContainerPort: 9000
          TargetGroupArn: !Ref TargetGroup

  TargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      Name: myapp-tg
      VpcId: !ImportValue VPC
      Port: 9000
      Protocol: HTTP
      Matcher:
        HttpCode: 200-299
      HealthCheckIntervalSeconds: 30
      HealthCheckPath: /myapp-svc/index.html
      HealthCheckProtocol: HTTP
      HealthCheckTimeoutSeconds: 10
      HealthyThresholdCount: 2
      UnhealthyThresholdCount: 6
      targettype: ip
      TargetGroupAttributes:
        - Key: deregistration_delay.timeout_seconds
          Value: 30

  ListenerRule:
    Type: AWS::ElasticLoadBalancingV2::ListenerRule
    Properties:
      ListenerArn: !ImportValue LoadBalancerListenerHTTPS
      Priority: 20
      Conditions:
        - Field: path-pattern
          Values:
            - /myapp-svc/*
      Actions:
        - TargetGroupArn: !Ref TargetGroup
          Type: forward

  ECSAutoScalingTarget:
    Type: AWS::ApplicationAutoScaling::scalableTarget
    Properties:
      MaxCapacity: 6
      MinCapacity: 1
      ResourceId: !Join ["/",[service,!ImportValue ECSCluster,!GetAtt Service.Name]] # service/clusterName/serviceName = service/ecs-cluster-myapp/myapp-service
      RoleARN: !Sub 'arn:aws:iam::${AWS::AccountId}:role/aws-service-role/ecs:application-autoscaling:amazonaws:com/AWSServiceRoleForApplicationAutoScaling_ECSService'
      scalableDimension: ecs:service:DesiredCount
      ServiceNamespace: ecs

  cpuutilizationAutoScalingPolicy:
    Type: AWS::ApplicationAutoScaling::ScalingPolicy
    Properties:
      PolicyName: myapp-cpu-target-tracking-scaling-policy
      PolicyType: TargetTrackingScaling
      ScalingTargetId: !Ref ECSAutoScalingTarget
      TargetTrackingScalingPolicyConfiguration:
        disableScaleIn: true # disable scale in for this policy to give ALBRequestPolicy the priority on scale in decisions
        PredefinedMetricSpecification:
          PredefinedMetricType: ECSServiceAveragecpuutilization
        ScaleInCooldown: 300
        ScaleOutCooldown: 30
        TargetValue: 50 # Average 50% cpu utilization


  ServiceScalingPolicyALB:
    Type: AWS::ApplicationAutoScaling::ScalingPolicy
    Properties:
      PolicyName: alb-requests-per-target-per-minute
      PolicyType: TargetTrackingScaling
      ScalingTargetId: !Ref ECSAutoScalingTarget
      TargetTrackingScalingPolicyConfiguration:
        TargetValue: 1000
        ScaleInCooldown: 300
        ScaleOutCooldown: 30
        PredefinedMetricSpecification:
          PredefinedMetricType: ALBRequestCountPerTarget
          ResourceLabel: !Join
            - '/'
            - - !ImportValue EcsloadBalancerFullName
              - !GetAtt TargetGroup.TargetGroupFullName

  # NOTE: the ALB RequestCountPerTarget metric alarms are automatically
  # created when we use that policy. But if we want a different evaluation period,# we need to define our own alarms. So,the new scale IN/OUT alarms are included below.

  # SCALE OUT ALARM: if the total (SUM) of ALB requests per target is ABOVE the
  # threshold a certain number of times in the past period,THEN send "scale out" alarm.
  ALBRequestsScaleOutAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      MetricName: RequestCountPerTarget
      Namespace: AWS/ApplicationELB # Only a period greater than 60s is supported for metrics in the "AWS/" namespaces
      ActionsEnabled: true
      AlarmActions:
        - !Ref ServiceScalingPolicyALB
      # OKActions: []
      # InsufficientDataActions: []
      Statistic: Sum
      Dimensions:
        - Name: LoadBalancer
          Value: !ImportValue EcsloadBalancerFullName
        - Name: TargetGroup
          Value: !GetAtt TargetGroup.TargetGroupFullName
      Period: 60     # evaluation period (in seconds) = 1 datapoint
      EvaluationPeriods: 1 # number of prevIoUs periods to take into account
      DatapointsToAlarm: 1 # number of datapoints above threshold needed to generate an alarm
      Threshold: 1000  # alarm threshold: more than 1000 requests
      Unit: None
      Comparisonoperator: GreaterThanThreshold

  # SCALE IN ALARM: if the total (SUM) of ALB requests per target is BELOW the
  # threshold a certain number of times in the past period,THEN send "scale in" alarm.
  ALBRequestsScaleInAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      MetricName: RequestCountPerTarget
      Namespace: AWS/ApplicationELB
      Statistic: Sum
      Period: 60     # evaluation period (in seconds) = 1 datapoint
      EvaluationPeriods: 5 # number of prevIoUs periods to take into account
      DatapointsToAlarm: 1 # number of datapoints above threshold needed to generate an alarm
      Threshold: 500  # alarm threshold: less than 500 requests
      Unit: None
      AlarmActions:
        - !Ref ServiceScalingPolicyALB
      OKActions:
        - !Ref ServiceScalingPolicyALB
      Dimensions:
        - Name: LoadBalancer
          Value: !ImportValue EcsloadBalancerFullName
        - Name: TargetGroup
          Value: !GetAtt TargetGroup.TargetGroupFullName
      Comparisonoperator: LessthanThreshold

Q1:我做错了什么?如何在 CloudFormation 中指定 ACTION,以便我的警报触发与自动配置的 AWS 警报(在我创建策略时自动创建的警报)触发的操作相同的操作?

Q2:有没有办法在 AWS 控制台中查看 ACTION?我猜 AWS 将这些东西隐藏在幕后(可能是 lambda 或其他?)。

Q3:有没有人有其他方法可以做到这一点?也许用步进缩放?我也愿意在 60 秒以下触发,所以也许我应该远离目标跟踪?

如果有人有一个 CloudFormation 模板的工作样本,可以根据对 ALB 的请求数量在一分钟或更短的时间内触发,那肯定很棒:) 我把它放在一个单独的问题中(如何在更短的时间内触发)超过一分钟):ECS Fargate autoscaling more rapidly?

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。