Docker下的Postgres从主机处杀死OOM

如何解决Docker下的Postgres从主机处杀死OOM

我有一个具有8GB内存（已格式化）的VM，其上有2个Docker容器：

最小的指标导出程序，最大为32MB
一个带我的数据库的Bitnami Postgres12容器。

数据已完全加载，但是在导入期间，我陷入了CREATE INDEX语句中，该语句向具有约8亿条记录的表添加了简单的hash（ID）索引。

索引创建过程中的问题是，postgres进程的大小会增加到主机内核对其执行OOM杀死的程度。

使用（Nomad）配置参数，将Docker容器配置为使用约6GB（但我也尝试使用较低的值）：

config {
        memory_hard_limit = 6144
...
resources {
        memory = 6144

此外，vm.overcommit_memory = 1（以前是默认的0）和vm.overcommit_ratio = 100（之前的默认值是50）。

Docker CLI正确显示内存限制：

# more /sys/fs/cgroup/memory/memory.limit_in_bytes
6442450944

但是Postgres进行CREATE INDEX时，它最终会因OOM而被杀死：

Oct  1 15:17:46 prod kernel: [11515.764300] postgres invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL),nodemask=(null),order=0,oom_score_adj=0
Oct  1 15:17:46 prod kernel: [11515.764302] postgres cpuset=d562f548ef35884fb77a41e53466a18205d63cce3d9ffa8a808d2e95b5609e3e mems_allowed=0
Oct  1 15:17:46 prod kernel: [11515.764308] cpu: 1 PID: 18485 Comm: postgres Not tainted 4.15.0-118-generic #119-Ubuntu
Oct  1 15:17:46 prod kernel: [11515.764309] Hardware name: OpenStack Foundation OpenStack Nova,BIOS 1.10.2-1ubuntu1 04/01/2014
Oct  1 15:17:46 prod kernel: [11515.764309] Call Trace:
Oct  1 15:17:46 prod kernel: [11515.764331]  dump_stack+0x6d/0x8e
Oct  1 15:17:46 prod kernel: [11515.764335]  dump_header+0x71/0x285
Oct  1 15:17:46 prod kernel: [11515.764337]  oom_kill_process+0x21f/0x420
Oct  1 15:17:46 prod kernel: [11515.764339]  out_of_memory+0x116/0x4e0
Oct  1 15:17:46 prod kernel: [11515.764343]  mem_cgroup_out_of_memory+0xbb/0xd0
Oct  1 15:17:46 prod kernel: [11515.764345]  mem_cgroup_oom_synchronize+0x2e8/0x320
Oct  1 15:17:46 prod kernel: [11515.764347]  ? mem_cgroup_css_reset+0xe0/0xe0
Oct  1 15:17:46 prod kernel: [11515.764349]  pagefault_out_of_memory+0x36/0x7b
Oct  1 15:17:46 prod kernel: [11515.764354]  mm_fault_error+0x90/0x180
Oct  1 15:17:46 prod kernel: [11515.764355]  __do_page_fault+0x46b/0x4b0
Oct  1 15:17:46 prod kernel: [11515.764357]  do_page_fault+0x2e/0xe0
Oct  1 15:17:46 prod kernel: [11515.764363]  ? async_page_fault+0x2f/0x50
Oct  1 15:17:46 prod kernel: [11515.764366]  do_async_page_fault+0x51/0x80
Oct  1 15:17:46 prod kernel: [11515.764367]  async_page_fault+0x45/0x50
Oct  1 15:17:46 prod kernel: [11515.764369] RIP: 0033:0x562aff7df26d
Oct  1 15:17:46 prod kernel: [11515.764370] RSP: 002b:00007ffef148b790 EFLAGS: 00010206
[...]
Oct  1 15:17:46 prod kernel: [11515.764374] R13: 0000000000000195 R14: 0000000000000000 R15: 0000000000000000
Oct  1 15:17:46 prod kernel: [11515.764376] Task in /docker/d562f548ef35884fb77a41e53466a18205d63cce3d9ffa8a808d2e95b5609e3e killed as a result of limit of /docker/d562f548ef35884fb77a41e53466a18205d63cce3d9ffa8a808d2e95b5609e3e
Oct  1 15:17:46 prod kernel: [11515.764382] memory: usage 6291456kB,limit 6291456kB,failcnt 6210203
Oct  1 15:17:46 prod kernel: [11515.764383] memory+swap: usage 0kB,limit 9007199254740988kB,failcnt 0
Oct  1 15:17:46 prod kernel: [11515.764384] kmem: usage 29264kB,failcnt 0
Oct  1 15:17:46 prod kernel: [11515.764384] Memory cgroup stats for /docker/d562f548ef35884fb77a41e53466a18205d63cce3d9ffa8a808d2e95b5609e3e: cache:1628316KB RSS:4633876KB RSS_huge:0KB shmem:1628260KB mapped_file:1628272KB dirty:0KB writeback:0KB inactive_anon:1617900KB active_anon:4644212KB inactive_file:0KB active_file:0KB unevictable:0KB
Oct  1 15:17:46 prod kernel: [11515.764391] [ pid ]   uid  tgid total_vm      RSS pgtables_bytes swapents oom_score_adj name
Oct  1 15:17:46 prod kernel: [11515.764512] [18146]  1001 18146   424925    16612   262144        0             0 postgres
Oct  1 15:17:46 prod kernel: [11515.764525] [18341]  1001 18341   425526    24841   954368        0             0 postgres
Oct  1 15:17:46 prod kernel: [11515.764527] [18342]  1001 18342   424958   388655  3293184        0             0 postgres
Oct  1 15:17:46 prod kernel: [11515.764529] [18343]  1001 18343   424958     5540   180224        0             0 postgres
Oct  1 15:17:46 prod kernel: [11515.764530] [18344]  1001 18344   425071     2106   180224        0             0 postgres
Oct  1 15:17:46 prod kernel: [11515.764531] [18345]  1001 18345    16582     1156   135168        0             0 postgres
Oct  1 15:17:46 prod kernel: [11515.764533] [18346]  1001 18346   425062     1743   159744        0             0 postgres
Oct  1 15:17:46 prod kernel: [11515.764537] [18418]  1001 18418   425631     5605   196608        0             0 postgres
Oct  1 15:17:46 prod kernel: [11515.764540] [18483]  1001 18483   428502     7794   225280        0             0 postgres
Oct  1 15:17:46 prod kernel: [11515.764541] [18485]  1001 18485  1578423  1561820 12652544        0             0 postgres
Oct  1 15:17:46 prod kernel: [11515.764545] Memory cgroup out of memory: Kill process 18485 (postgres) score 994 or sacrifice child
Oct  1 15:17:46 prod kernel: [11515.767788] Killed process 18485 (postgres) total-vm:6313692kB,anon-RSS:4614856kB,file-RSS:11372kB,shmem-RSS:1621052kB
Oct  1 15:17:47 prod kernel: [11516.011553] oom_reaper: reaped process 18485 (postgres),Now anon-RSS:0kB,file-RSS:0kB,shmem-RSS:1621052kB

我已经尝试过在主机，Docker配置和Postgres内存设置上使用各种内存大小组合，但是每次内存使用量一直在上升，直到关闭oom-killer为止。

为什么Postgres看起来不能发挥出色并停留在内部-我相信-被允许分配内存？