Cloudwatch 警报未将丢失的数据视为 notBreached

如何解决Cloudwatch 警报未将丢失的数据视为 notBreached

鉴于 Cloudwatch 警报来监控 API 网关错误率，Cloudwatch 不会将丢失的数据点视为 notBreaching。
我想在 5 分钟间隔内错误率 > 25% 时触发警报。
警报详情：
时间：1 分钟
要报警的数据点： 5 个中的 3 个
缺失数据处理：将缺失数据视为良好（不超过阈值）

我注意到因以下原因触发了 cloudwatch 警报：

阈值越过：最后 5 个数据点中的 3 个 [100.0 (27/05/21 21:56:00)、100.0 (27/05/21 21:54:00)、100.0 (27/05/21 21:49:00)] 分别是大于或等于阈值 (25.0) 和 2 个缺失数据点被视为 [NonBreaching]（OK -> ALARM 的最少 3 个数据点过渡）。

我希望数据点每分钟计算一次 ie 27/05/21 21:50:00,27/05/21 21:51:00,27/05/21 21:52 :00,27/05/21 21:53:00,27/05/21 21:55:00 应该标记为 Good。所以最近的 5 个数据点应该是
27/05/21 21:56:00：警报
27/05/21 21:55:00 ：好的（由于未破坏而丢失数据）
27/05/21 21:54:00：警报
27/05/21 21:53:00 ：好的（由于未破坏而丢失数据）
27/05/21 21:52:00 ：好的（由于未破坏而丢失数据）
在最近的 5 个数据点中，只有 2 个应该处于 ALARM 状态并且最终结果不应该触发警报。
想知道我错过了什么吗？

Terraform 代码片段：

resource "aws_cloudwatch_metric_alarm" "api_error_spike" {
  alarm_name = "API error rate exceeding threshold"
  alarm_description = "API error rate has exceeded allowed 25% threshold over 5 minutes"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods = "5"
  datapoints_to_alarm = "3" // 3 out of 5 data points should be in ALARM state to trigger alarm
  treat_missing_data = "notBreaching"

  threshold = 25

  metric_query {
    id = "e1"
    expression = "(m1+m2)*100"
    label = "API Error Rate"
    return_data = "true"
  }

  metric_query {
    id = "m1"
    metric {
      metric_name = "5XXError"
      period = "60" // 60 seconds is the lowest precision for standard (in AWS/ namespace) metrics
      stat = "Average" // Average represents Error rate. Sum represents total errors
      unit = "Count"
      namespace = "AWS/ApiGateway"
      dimensions = {
        ApiName = "foo"
      }
    }
  }

  metric_query {
    id = "m2"
    metric {
      metric_name = "4XXError"
      period = "60"
      stat = "Average" // Average represents Error rate. Sum represents total errors
      unit = "Count"
      namespace = "AWS/ApiGateway"
      dimensions = {
        ApiName = "foo"
      }
    }
  }
}

Cloudwatch 警报未将丢失的数据视为 notBreached

如何解决Cloudwatch 警报未将丢失的数据视为 notBreached

相关推荐