微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

在我的 Django celery worker 实例上遇到内存错误

如何解决在我的 Django celery worker 实例上遇到内存错误

我正在使用 django celery 和 redis(代理)。我在我的一个工作实例上观察到以下错误

[2020-12-27 02:26:15,920: INFO/MainProcess] missed heartbeat from worker@ip-xxx-xx-xx-
xxx.ec2.internal
[2020-12-27 02:26:40,937: INFO/MainProcess] missed heartbeat from worker@ip-xxx-xx-xx-xxx.ec2.internal
[2020-12-27 02:27:00,943: INFO/MainProcess] missed heartbeat from worker@ip-xxx-xx-xx-xxx.ec2.internal
[2020-12-27 02:27:15,955: INFO/MainProcess] missed heartbeat from worker@ip-xxx-xx-xx-xxx.ec2.internal
[2020-12-27 02:27:45,971: INFO/MainProcess] missed heartbeat from worker@ip-xxx-xx-xx-xxx.ec2.internal
[2020-12-27 02:28:02,118: INFO/MainProcess] missed heartbeat from worker@ip-xxx-xx-xx-xxx.ec2.internal
[2020-12-27 02:28:36,496: CRITICAL/MainProcess] Unrecoverable error: MemoryError()
Traceback (most recent call last):
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/celery/worker/worker.py",line 205,in start
    self.blueprint.start(self)
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/celery/bootsteps.py",line 119,in start
    step.start(parent)
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/celery/bootsteps.py",line 369,in start
    return self.obj.start()
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/celery/worker/consumer/consumer.py",line 318,in start
    blueprint.start(self)
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/celery/bootsteps.py",in start
    step.start(parent)
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/celery/worker/consumer/consumer.py",line 596,in start
    c.loop(*c.loop_args())
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/celery/worker/loops.py",line 83,in asynloop
    next(loop)
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/kombu/asynchronous/hub.py",line 364,in create_loop
    cb(*cbargs)
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/kombu/transport/redis.py",line 1074,in on_readable
    self.cycle.on_readable(fileno)
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/kombu/transport/redis.py",line 359,in on_readable
    chan.handlers[type]()
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/kombu/transport/redis.py",line 694,in _receive
    ret.append(self._receive_one(c))
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/kombu/transport/redis.py",line 700,in _receive_one
    response = c.parse_response()
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/redis/client.py",line 3036,in parse_response
    return self._execute(connection,connection.read_response)
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/redis/client.py",line 3013,in _execute
    return command(*args)
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/redis/connection.py",line 637,in read_response
    response = self._parser.read_response()
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/redis/connection.py",line 330,in read_response
    response = [self.read_response() for i in xrange(length)]
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/redis/connection.py",in <listcomp>
    response = [self.read_response() for i in xrange(length)]
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/redis/connection.py",line 324,in read_response
    response = self._buffer.read(length)
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/redis/connection.py",in read
    self._read_from_socket(length - self.length)
  File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/redis/connection.py",line 186,in _read_from_socket
    buf.write(data)
MemoryError
[2020-12-27 06:44:31,570: INFO/MainProcess] Connected to redis://xxxxxxxxxx.cache.amazonaws.com:6379//
[2020-12-27 06:44:31,585: INFO/MainProcess] mingle: searching for neighbors
[2020-12-27 06:44:32,611: INFO/MainProcess] mingle: sync with 1 nodes

我只是想确认一下,这个内存错误是由于我代码中某处的内存泄漏、某些特定于工作人员的问题,还是由于其他一些原因。 我非常感谢任何帮助/建议找出根本原因。

注意:我的工人(在 aws 上)的实例类型是 t2.small

解决方法

有道理(小例子),但我更担心健康检查失败(缺少心跳)。

这里有一些想法:

  • 尝试分析您的 celery 任务以了解它消耗了多少内存。是不是超过这个实例类型的2GB?
  • 您为工作人员定义的并发级别是多少?你有没有尝试减少这个数字?如果 c==2 并且每个任务消耗 2GB(例如),这可以解释您的问题。
  • 使用 CloudWatch 指标(在 AWS 控制台中)查看 CPU 和内存利用率,看看您是否发现错误时间与图表中的某些峰值之间存在相关性。
  • 如果它是可重现的,您可以在出现此错误时尝试 htop - 以确保这是资源限制(内存/CPU)。
  • 自己收集这些指标 - 它总能在此类情况下为您提供帮助。

祝你好运!

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。