如何解决Cassandra 中的 G1GC 长时间停顿导致突变下降
我正在运行 3 个 DC 和 10 个节点的 Cassandra 3.0.11 集群。
我经常看到以下消息
WARN [Service Thread] 2021-02-10 14:03:10,219 GCInspector.java:282 - G1 Young Generation GC in 1317ms. G1 Eden Space: 4546625536 -> 0; G1 Old Gen: 22573336584 -> 25140250632; G1 Survivor Space: 1124073472 -> 721420288;
WARN [Service Thread] 2021-02-10 14:03:11,916 GCInspector.java:282 - G1 Young Generation GC in 1382ms. G1 Eden Space: 989855744 -> 0; G1 Old Gen: 25140250632 -> 26364987400; G1 Survivor Space: 721420288 -> 218103808;
WARN [Service Thread] 2021-02-10 14:03:49,801 GCInspector.java:282 - G1 Young Generation GC in 1072ms. G1 Eden Space: 4496293888 -> 0; G1 Old Gen: 17078798632 -> 19586992416; G1 Survivor Space: 620756992 -> 654311424;
WARN [Service Thread] 2021-02-10 14:03:51,471 GCInspector.java:282 - G1 Young Generation GC in 1336ms. G1 Eden Space: 1056964608 -> 0; G1 Old Gen: 19586992416 -> 20870449448; G1 Survivor Space: 654311424 -> 218103808;
WARN [Service Thread] 2021-02-10 14:04:42,262 GCInspector.java:282 - G1 Young Generation GC in 8909ms. G1 Eden Space: 1493172224 -> 0; G1 Old Gen: 32195070248 -> 34099284256;
WARN [Service Thread] 2021-02-10 14:04:44,990 GCInspector.java:282 - G1 Young Generation GC in 2520ms. G1 Old Gen: 34099284256 -> 34317388064; G1 Survivor Space: 218103808 -> 0;
WARN [Service Thread] 2021-02-10 14:04:47,245 GCInspector.java:282 - G1 Old Generation GC in 28836ms. G1 Old Gen: 34317388064 -> 11666582136; Metaspace: 49839232 -> 49835448
我正在使用带有 32Gb 堆的 G1GC。因此,我经常看到丢失的突变
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 1747789164 0 0
ViewMutationStage 0 0 0 0 0
ReadStage 0 0 12399767 0 0
RequestResponseStage 0 0 627930907 0 0
ReadRepairstage 0 0 60775 0 0
CounterMutationStage 0 0 0 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 0 0 2101437 0 0
MemtableReclaimMemory 0 0 4381 0 0
PendingRangeCalculator 0 0 66 0 0
GossipStage 0 0 1350977 0 0
SecondaryIndexManagement 0 0 0 0 0
Hintsdispatcher 0 0 11394 0 0
MigrationStage 0 0 207917 0 0
MemtablePostFlush 0 0 3667 0 0
ValidationExecutor 0 0 0 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 2926 0 0
InternalResponseStage 0 0 420120 0 0
AntiEntropyStage 0 0 0 0 0
CacheCleanupExecutor 0 0 0 0 0
Native-Transport-Requests 3 0 3503749628 0 12323589
Message type Dropped
READ 66919
RANGE_SLICE 8260
_TRACE 0
HINT 2208871
MUTATION 5207285
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 16491
PAGED_RANGE 0
READ_REPAIR 9
我曾尝试使用 sjk 工具,但我经常看到 sharedworker-pool
Monitoring threads ...
2021-02-10T14:01:27.672-0700 Process summary
process cpu=355.30%
application cpu=362.09% (user=322.78% sys=39.31%)
other: cpu=-6.79%
thread count: 823
heap allocation rate 1168mb/s
[000642] user=26.57% sys= 0.86% alloc= 119mb/s - SharedPool-Worker-10
[000647] user=23.41% sys= 0.93% alloc= 115mb/s - SharedPool-Worker-12
[000636] user=25.83% sys= 2.34% alloc= 111mb/s - SharedPool-Worker-4
[000634] user=20.25% sys= 0.27% alloc= 100mb/s - SharedPool-Worker-2
[000652] user=19.14% sys= 0.17% alloc= 99mb/s - SharedPool-Worker-19
[000648] user=19.14% sys= 0.19% alloc= 98mb/s - SharedPool-Worker-16
[000637] user=21.00% sys= 0.25% alloc= 94mb/s - SharedPool-Worker-5
[000633] user=12.82% sys= 2.51% alloc= 32mb/s - SharedPool-Worker-1
[000654] user= 7.25% sys= 0.76% alloc= 31mb/s - SharedPool-Worker-20
检查是什么导致堆填充并导致 GC 的最佳方法是什么?
更新 cpu信息
~]$ lscpu
Architecture: x86_64
cpu op-mode(s): 32-bit,64-bit
Byte Order: Little Endian
cpu(s): 32
On-line cpu(s) list: 0-31
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 32
NUMA node(s): 32
vendor ID: GenuineIntel
cpu family: 6
Model: 61
Model name: Intel Core Processor (broadwell,IBRS)
Stepping: 2
cpu MHz: 2095.320
BogoMIPS: 4190.64
Virtualization: VT-x
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
总内存
176GB
客户
sudo netstat | grep 9042 | grep ESTABLISHED| wc -l
295
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。