微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何确保火花作业正在利用所有可用资源利用所有容器

如何解决如何确保火花作业正在利用所有可用资源利用所有容器

我有一个 Spark 流作业,它从 kafka 读取数据并将其放入数据仓库。

它运行得很好,但令人担忧的是,当我监视日志时,我注意到我的应用程序中的日志仅在一个容器上,而容器的其余部分仅显示 GC(分配失败)。尝试过 -num-executors 和 --executor-memory 的不同组合。 是真的利用了集群的全部可用资源还是只有一个工作节点在做所有工作。

我正在使用 AWS EMR 来部署我的工作。集群有一个主节点(m5xLarge)和 2 个数据节点(m5xLarge)。 每个节点上的可用内存 16 GB,4 vcpu

来自其中一个容器的日志

2021-03-25T17:30:30.539+0000: [GC (Allocation Failure) 2021-03-25T17:30:30.539+0000: [ParNew: 73307K->6385K(74944K),0.0072151 secs] 161323K->95020K(241536K),0.0073002 secs] [Times: user=0.02 sys=0.00,real=0.01 secs] 
2021-03-25T17:30:31.249+0000: [GC (Allocation Failure) 2021-03-25T17:30:31.249+0000: [ParNew: 73009K->4636K(74944K),0.0065039 secs] 161644K->94637K(241536K),0.0065701 secs] [Times: user=0.01 sys=0.01,real=0.01 secs] 
2021-03-25T17:30:31.457+0000: [GC (Allocation Failure) 2021-03-25T17:30:31.457+0000: [ParNew: 71260K->3970K(74944K),0.0040602 secs] 161261K->93971K(241536K),0.0041415 secs] [Times: user=0.02 sys=0.00,real=0.00 secs] 
2021-03-25T17:30:31.700+0000: [GC (Allocation Failure) 2021-03-25T17:30:31.700+0000: [ParNew: 70594K->5094K(74944K),0.0044768 secs] 160595K->95095K(241536K),0.0045439 secs] [Times: user=0.02 sys=0.00,real=0.00 secs] 
2021-03-25T17:30:33.044+0000: [GC (Allocation Failure) 2021-03-25T17:30:33.044+0000: [ParNew: 71718K->7049K(74944K),0.0131990 secs] 161719K->97050K(241536K),0.0132825 secs] [Times: user=0.04 sys=0.00,real=0.02 secs] 
2021-03-25T17:30:35.042+0000: [GC (Allocation Failure) 2021-03-25T17:30:35.042+0000: [ParNew: 73673K->7211K(74944K),0.0063486 secs] 163674K->97868K(241536K),0.0064325 secs] [Times: user=0.03 sys=0.00,real=0.01 secs] 
2021-03-25T17:30:35.202+0000: [GC (Allocation Failure) 2021-03-25T17:30:35.202+0000: [ParNew: 73835K->8320K(74944K),0.0125162 secs] 164492K->103499K(241536K),0.0125880 secs] [Times: user=0.04 sys=0.00,real=0.02 secs] 
2021-03-25T17:30:35.340+0000: [GC (Allocation Failure) 2021-03-25T17:30:35.340+0000: [ParNew: 74904K->8320K(74944K),0.0165272 secs] 170084K->116030K(241536K),0.0166063 secs] [Times: user=0.05 sys=0.01,real=0.02 secs] 
2021-03-25T17:30:37.309+0000: [GC (Allocation Failure) 2021-03-25T17:30:37.309+0000: [ParNew: 74944K->8319K(74944K),0.0116421 secs] 182654K->117918K(241536K),0.0117176 secs] [Times: user=0.04 sys=0.00,real=0.01 secs] 
2021-03-25T17:30:37.441+0000: [GC (Allocation Failure) 2021-03-25T17:30:37.441+0000: [ParNew: 74943K->6171K(74944K),0.0188051 secs] 184542K->120606K(241536K),0.0188938 secs] [Times: user=0.05 sys=0.00,real=0.02 secs] 
2021-03-25T17:30:37.652+0000: [GC (Allocation Failure) 2021-03-25T17:30:37.652+0000: [ParNew: 72674K->8320K(74944K),0.0080351 secs] 187109K->124677K(241536K),0.0081014 secs] [Times: user=0.03 sys=0.00,real=0.01 secs] 

Spark 2.4.0 版 Scala 版本 2.11.12

谢谢

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。