微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何在 Java AWS SDK 的上下文中获取正在使用/打开连接的文件描述符数量? 问题描述考虑的解决方案完整的错误日志PS

如何解决如何在 Java AWS SDK 的上下文中获取正在使用/打开连接的文件描述符数量? 问题描述考虑的解决方案完整的错误日志PS

问题描述

目前,我在服务中看到来自 Lambda SDK 2.0(带有 Netty 客户端)的 SdkClientException: Unable to execute HTTP request: Acquire operation took longer than the configured maximum time(完整错误日志)异常,其中多个节点轮询来自 N 个队列的 SQS 消息并尝试以非常高的速度调用 lambda (无限制)速率。

我尝试根据每个节点的 cpu 使用情况应用背压。这并没有真正的帮助,因为以高速率使用 SQS 消息仍然会在每个主机上产生大量网络连接,从而保持较低的 cpu 使用率,从而导致相同的错误

此外,增加连接获取超时也无济于事(甚至使情况变得更糟),因为连接获取的积压正在堆积,而新的 Lambda 调用请求正在传入。类似适用于增加最大连接数(目前,我有 120000 个最大连接值)。

因此,我正在构建一个 SQS 背压机制,该机制可防止节点根据该节点上打开的网络连接数轮询更多消息。

问题是

  1. 如何获取主机上打开的连接数? (除了下面的解决方案)
  2. 是否有任何 Java 库/框架可以在不为下面提到的选项实现自定义代码的情况下使用?

考虑的解决方

  1. 根据作为 SDK metrics 一部分发出的 LeasedConcurrency 指标(通过 CloudWatchMetricpublisher获取
  2. 基于 JMX FileDescriptorUse 指标获取

完整的错误日志

software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Acquire operation took longer than the configured maximum time. This indicates that a request cannot get a connection from the pool within the specified maximum time. This can be due to high request rate.
Consider taking any of the following actions to mitigate the issue: increase max connections,increase acquire timeout,or slowing the request rate.
Increasing the max connections can increase client throughput (unless the network interface is already fully utilized),but can eventually start to hit operation system limitations on the number of file descriptors used by the process. If you already are fully utilizing your network interface or cannot further increase your connection count,increasing the acquire timeout gives extra time for requests to acquire a connection before timing out. If the connections doesn't free up,the subsequent requests will still timeout.
If the above mechanisms are not able to fix the issue,try smoothing out your requests so that large traffic bursts cannot overload the client,being more efficient with the number of times you need to call AWS,or by increasing the number of hosts sending requests.
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:98) ~[AwsJavaSdk-Core-2.0.jar:?]

PS

任何相关网络/操作系统/背压资源的链接包括低级细节,例如 cpu 低的原因,而主机上有大量连接需要处理)将不胜感激

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。