如何解决在我的 Django celery worker 实例上遇到内存错误
我正在使用 django celery 和 redis(代理)。我在我的一个工作实例上观察到以下错误。
[2020-12-27 02:26:15,920: INFO/MainProcess] missed heartbeat from worker@ip-xxx-xx-xx-
xxx.ec2.internal
[2020-12-27 02:26:40,937: INFO/MainProcess] missed heartbeat from worker@ip-xxx-xx-xx-xxx.ec2.internal
[2020-12-27 02:27:00,943: INFO/MainProcess] missed heartbeat from worker@ip-xxx-xx-xx-xxx.ec2.internal
[2020-12-27 02:27:15,955: INFO/MainProcess] missed heartbeat from worker@ip-xxx-xx-xx-xxx.ec2.internal
[2020-12-27 02:27:45,971: INFO/MainProcess] missed heartbeat from worker@ip-xxx-xx-xx-xxx.ec2.internal
[2020-12-27 02:28:02,118: INFO/MainProcess] missed heartbeat from worker@ip-xxx-xx-xx-xxx.ec2.internal
[2020-12-27 02:28:36,496: CRITICAL/MainProcess] Unrecoverable error: MemoryError()
Traceback (most recent call last):
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/celery/worker/worker.py",line 205,in start
self.blueprint.start(self)
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/celery/bootsteps.py",line 119,in start
step.start(parent)
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/celery/bootsteps.py",line 369,in start
return self.obj.start()
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/celery/worker/consumer/consumer.py",line 318,in start
blueprint.start(self)
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/celery/bootsteps.py",in start
step.start(parent)
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/celery/worker/consumer/consumer.py",line 596,in start
c.loop(*c.loop_args())
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/celery/worker/loops.py",line 83,in asynloop
next(loop)
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/kombu/asynchronous/hub.py",line 364,in create_loop
cb(*cbargs)
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/kombu/transport/redis.py",line 1074,in on_readable
self.cycle.on_readable(fileno)
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/kombu/transport/redis.py",line 359,in on_readable
chan.handlers[type]()
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/kombu/transport/redis.py",line 694,in _receive
ret.append(self._receive_one(c))
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/kombu/transport/redis.py",line 700,in _receive_one
response = c.parse_response()
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/redis/client.py",line 3036,in parse_response
return self._execute(connection,connection.read_response)
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/redis/client.py",line 3013,in _execute
return command(*args)
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/redis/connection.py",line 637,in read_response
response = self._parser.read_response()
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/redis/connection.py",line 330,in read_response
response = [self.read_response() for i in xrange(length)]
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/redis/connection.py",in <listcomp>
response = [self.read_response() for i in xrange(length)]
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/redis/connection.py",line 324,in read_response
response = self._buffer.read(length)
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/redis/connection.py",in read
self._read_from_socket(length - self.length)
File "/home/ec2-user/.virtualenvs/xxxxx/lib/python3.7/site-packages/redis/connection.py",line 186,in _read_from_socket
buf.write(data)
MemoryError
[2020-12-27 06:44:31,570: INFO/MainProcess] Connected to redis://xxxxxxxxxx.cache.amazonaws.com:6379//
[2020-12-27 06:44:31,585: INFO/MainProcess] mingle: searching for neighbors
[2020-12-27 06:44:32,611: INFO/MainProcess] mingle: sync with 1 nodes
我只是想确认一下,这个内存错误是由于我代码中某处的内存泄漏、某些特定于工作人员的问题,还是由于其他一些原因。 我非常感谢任何帮助/建议找出根本原因。
注意:我的工人(在 aws 上)的实例类型是 t2.small
解决方法
有道理(小例子),但我更担心健康检查失败(缺少心跳)。
这里有一些想法:
- 尝试分析您的 celery 任务以了解它消耗了多少内存。是不是超过这个实例类型的2GB?
- 您为工作人员定义的并发级别是多少?你有没有尝试减少这个数字?如果
c==2
并且每个任务消耗 2GB(例如),这可以解释您的问题。 - 使用 CloudWatch 指标(在 AWS 控制台中)查看 CPU 和内存利用率,看看您是否发现错误时间与图表中的某些峰值之间存在相关性。
- 如果它是可重现的,您可以在出现此错误时尝试
htop
- 以确保这是资源限制(内存/CPU)。 - 自己收集这些指标 - 它总能在此类情况下为您提供帮助。
祝你好运!
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。