如何解决故障转移到从属后,Artemis 主节点未启动
我有一个通过 NFS 使用共享存储配置的 Artemis 2.11.0 HA 对(但我不知道安装选项)。这是主人的broker.xml
:
<?xml version='1.0'?>
<configuration xmlns="urn:activemq"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xi="http://www.w3.org/2001/XInclude"
xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
<core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:activemq:core ">
<name>0.0.0.0</name>
<persistence-enabled>true</persistence-enabled>
<journal-type>NIO</journal-type>
<paging-directory>\\test\data\paging</paging-directory>
<bindings-directory>\\test\data\bindings</bindings-directory>
<journal-directory>\\test\data\journal</journal-directory>
<large-messages-directory>\\test\data\large-messages</large-messages-directory>
<journal-datasync>true</journal-datasync>
<journal-min-files>2</journal-min-files>
<journal-pool-files>10</journal-pool-files>
<journal-device-block-size>4096</journal-device-block-size>
<journal-file-size>10M</journal-file-size>
<journal-buffer-timeout>752000</journal-buffer-timeout>
<journal-max-io>1</journal-max-io>
<disk-scan-period>5000</disk-scan-period>
<max-disk-usage>90</max-disk-usage>
<!-- should the broker detect dead locks and other issues -->
<critical-analyzer>false</critical-analyzer>
<critical-analyzer-timeout>120000</critical-analyzer-timeout>
<critical-analyzer-check-period>60000</critical-analyzer-check-period>
<critical-analyzer-policy>HALT</critical-analyzer-policy>
<page-sync-timeout>1028000</page-sync-timeout>
<global-max-size>4096Mb</global-max-size>
<ha-policy>
<shared-store>
<master>
<failover-on-shutdown>true</failover-on-shutdown>
</master>
</shared-store>
</ha-policy>
<connectors>
<connector name="netty">tcp://ip-address:61617</connector>
</connectors>
<broadcast-groups>
<broadcast-group name="teamsMQ-broadcast-group">
<local-bind-address>ip-address</local-bind-address>
<local-bind-port>9877</local-bind-port>
<group-address>224.0.0.1</group-address>
<group-port>9876</group-port>
<broadcast-period>2000</broadcast-period>
<connector-ref>netty</connector-ref>
</broadcast-group>
</broadcast-groups>
<discovery-groups>
<discovery-group name="teamsMQ-discovery-group">
<local-bind-address>ip-address</local-bind-address>
<group-address>224.0.0.1</group-address>
<group-port>9876</group-port>
<refresh-timeout>10000</refresh-timeout>
</discovery-group>
</discovery-groups>
<acceptors>
<acceptor name="netty">tcp://ip-address:61617</acceptor>
<!-- Acceptor for every supported protocol -->
<acceptor name="artemis">tcp://0.0.0.0:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true</acceptor>
<!-- AMQP Acceptor. Listens on default AMQP port for AMQP traffic.-->
<acceptor name="amqp">tcp://0.0.0.0:5672?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=AMQP;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true</acceptor>
<!-- STOMP Acceptor. -->
<acceptor name="stomp">tcp://0.0.0.0:61613?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=STOMP;useEpoll=true</acceptor>
<!-- HornetQ Compatibility Acceptor. Enables HornetQ Core and STOMP for legacy HornetQ clients. -->
<acceptor name="hornetq">tcp://0.0.0.0:5445?anycastPrefix=jms.queue.;multicastPrefix=jms.topic.;protocols=HORNETQ,STOMP;useEpoll=true</acceptor>
<!-- MQTT Acceptor -->
<acceptor name="mqtt">tcp://0.0.0.0:1883?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=MQTT;useEpoll=true</acceptor>
</acceptors>
<security-settings>
<security-setting match="#">
<permission type="createNonDurableQueue" roles="amq"/>
<permission type="deleteNonDurableQueue" roles="amq"/>
<permission type="createDurableQueue" roles="amq"/>
<permission type="deleteDurableQueue" roles="amq"/>
<permission type="createAddress" roles="amq"/>
<permission type="deleteAddress" roles="amq"/>
<permission type="consume" roles="amq"/>
<permission type="browse" roles="amq"/>
<permission type="send" roles="amq"/>
<!-- we need this otherwise ./artemis data imp wouldn't work -->
<permission type="manage" roles="amq"/>
</security-setting>
</security-settings>
<address-settings>
<!-- if you define auto-create on certain queues,management has to be auto-create -->
<address-setting match="activemq.management#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
<!--default for catch all-->
<address-setting match="#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
</address-settings>
<addresses>
<address name="DLQ">
<anycast>
<queue name="DLQ" />
</anycast>
</address>
<address name="ExpiryQueue">
<anycast>
<queue name="ExpiryQueue" />
</anycast>
</address>
</addresses>
</core>
</configuration>
还有奴隶的:
<?xml version='1.0'?>
<configuration xmlns="urn:activemq"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xi="http://www.w3.org/2001/XInclude"
xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
<core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:activemq:core ">
<name>0.0.0.0</name>
<persistence-enabled>true</persistence-enabled>
<journal-type>NIO</journal-type>
<paging-directory>\\test\data\paging</paging-directory>
<bindings-directory>\\test\data\bindings</bindings-directory>
<journal-directory>\\test\data\journal</journal-directory>
<large-messages-directory>\\test\data\large-messages</large-messages-directory>
<journal-datasync>true</journal-datasync>
<journal-min-files>2</journal-min-files>
<journal-pool-files>10</journal-pool-files>
<journal-device-block-size>4096</journal-device-block-size>
<journal-file-size>10M</journal-file-size>
<journal-buffer-timeout>688000</journal-buffer-timeout>
<journal-max-io>1</journal-max-io>
<disk-scan-period>5000</disk-scan-period>
<max-disk-usage>90</max-disk-usage>
<!-- should the broker detect dead locks and other issues -->
<critical-analyzer>false</critical-analyzer>
<critical-analyzer-timeout>120000</critical-analyzer-timeout>
<critical-analyzer-check-period>60000</critical-analyzer-check-period>
<critical-analyzer-policy>HALT</critical-analyzer-policy>
<page-sync-timeout>1028000</page-sync-timeout>
<global-max-size>4096Mb</global-max-size>
<ha-policy>
<shared-store>
<slave>
<failover-on-shutdown>true</failover-on-shutdown>
<allow-failback>true</allow-failback>
</slave>
</shared-store>
</ha-policy>
<connectors>
<connector name="netty">tcp://ip-address:61617</connector>
</connectors>
<broadcast-groups>
<broadcast-group name="teamsMQ-broadcast-group">
<local-bind-address>ip-address</local-bind-address>
<local-bind-port>9877</local-bind-port>
<group-address>224.0.0.1</group-address>
<group-port>9876</group-port>
<broadcast-period>2000</broadcast-period>
<connector-ref>netty</connector-ref>
</broadcast-group>
</broadcast-groups>
<discovery-groups>
<discovery-group name="teamsMQ-discovery-group">
<local-bind-address>ip-address</local-bind-address>
<group-address>224.0.0.1</group-address>
<group-port>9876</group-port>
<refresh-timeout>10000</refresh-timeout>
</discovery-group>
</discovery-groups>
<acceptors>
<acceptor name="netty">tcp://ip-address:61617</acceptor>
<!-- Acceptor for every supported protocol -->
<acceptor name="artemis">tcp://0.0.0.0:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=CORE,STOMP;useEpoll=true</acceptor>
<!-- MQTT Acceptor -->
<acceptor name="mqtt">tcp://0.0.0.0:1883?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=MQTT;useEpoll=true</acceptor>
</acceptors>
<security-settings>
<security-setting match="#">
<permission type="createNonDurableQueue" roles="amq"/>
<permission type="deleteNonDurableQueue" roles="amq"/>
<permission type="createDurableQueue" roles="amq"/>
<permission type="deleteDurableQueue" roles="amq"/>
<permission type="createAddress" roles="amq"/>
<permission type="deleteAddress" roles="amq"/>
<permission type="consume" roles="amq"/>
<permission type="browse" roles="amq"/>
<permission type="send" roles="amq"/>
<!-- we need this otherwise ./artemis data imp wouldn't work -->
<permission type="manage" roles="amq"/>
</security-setting>
</security-settings>
<address-settings>
<!-- if you define auto-create on certain queues,management has to be auto-create -->
<address-setting match="activemq.management#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
<!--default for catch all-->
<address-setting match="#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
</address-settings>
<addresses>
<address name="DLQ">
<anycast>
<queue name="DLQ" />
</anycast>
</address>
<address name="ExpiryQueue">
<anycast>
<queue name="ExpiryQueue" />
</anycast>
</address>
</addresses>
</core>
</configuration>
主节点因未知原因宕机。下面的日志被连续打印:
AMQ222154: Error checking DLQ: ActiveMQShutdownException[errorType=SHUTDOWN_ERROR message=Journal must be in state=LOADED,was [STOPPED]]
2021-01-15 23:02:05,414 WARN [org.apache.activemq.artemis.core.server] AMQ222154: Error checking DLQ: ActiveMQShutdownException[errorType=SHUTDOWN_ERROR message=Journal must be in state=LOADED,was [STOPPED]]
at org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkJournalIsLoaded(JournalImpl.java:1087) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendUpdateRecord(JournalImpl.java:886) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.Journal.appendUpdateRecord(Journal.java:98) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.updateDeliveryCount(AbstractJournalStorageManager.java:756) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.server.impl.QueueImpl.checkRedelivery(QueueImpl.java:3052) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.server.impl.RefsOperation.rollbackRedelivery(RefsOperation.java:166) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.server.impl.RefsOperation.afterRollback(RefsOperation.java:113) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.afterRollback(TransactionImpl.java:589) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.access$200(TransactionImpl.java:40) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl$4.done(TransactionImpl.java:442) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1.run(OperationContextImpl.java:244) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.11.0.jar:2.11.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_275]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_275]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.11.0.jar:2.11.0]
Slave 按预期出现,但抛出 NPE:
2021-01-15 23:02:27,529 INFO [org.apache.activemq.artemis.core.server] AMQ221010: Backup Server is now live
2021-01-15 23:02:27,545 ERROR [org.apache.activemq.artemis.core.server] AMQ224000: Failure in initialisation: java.lang.NullPointerException
at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation$FailbackChecker.<init>(SharedStoreBackupActivation.java:193) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.startFailbackChecker(SharedStoreBackupActivation.java:185) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.run(SharedStoreBackupActivation.java:118) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:3863) [artemis-server-2.11.0.jar:2.11.0]
master 尝试启动,但没有超过 AMQ221034: Waiting indefinitely to obtain live lock
。即使经过多次重启,日志仍然停留在这一点。
2021-01-15 23:03:56,238 INFO [org.apache.activemq.artemis.core.server] AMQ221006: Waiting to obtain live lock
2021-01-15 23:03:56,300 INFO [org.apache.activemq.artemis.core.server] AMQ221013: Using NIO Journal
2021-01-15 23:03:56,581 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE
2021-01-15 23:03:56,581 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol support for: AMQP
2021-01-15 23:03:56,581 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol support for: HORNETQ
2021-01-15 23:03:56,581 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol support for: MQTT
2021-01-15 23:03:56,581 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding protocol support for: OPENWIRE
2021-01-15 23:03:56,581 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol support for: STOMP
2021-01-15 23:03:56,644 WARN [org.apache.activemq.artemis.core.server] AMQ222035: Directory \\test\data\paging\cd776bae-1a55-11eb-985d-0050569136c8 did not have an identification file address.txt
2021-01-15 23:03:56,644 WARN [org.apache.activemq.artemis.core.server] AMQ222035: Directory \\test\data\paging\a84f1e4f-1f1a-11eb-a37f-0050569136c8 did not have an identification file address.txt
2021-01-15 23:03:56,644 WARN [org.apache.activemq.artemis.core.server] AMQ222035: Directory \\test\data\paging\a87edff5-1f1a-11eb-a37f-0050569136c8 did not have an identification file address.txt
2021-01-15 23:03:56,988 INFO [org.apache.activemq.artemis.core.server] AMQ221034: Waiting indefinitely to obtain live lock
您能否就此处的问题和恢复步骤提供建议?
slave 启动中的 NPE 对队列/功能有影响吗?
是否需要手动停止 Slave 才能让 Master 成功启动?
解决方法
如果主代理遇到任何“关键 IO”问题,它将自动关闭。当它自行关闭时,它将放弃它在共享日志上的锁定。当共享锁被放弃时,slave 将自动激活。当主机重新启动时,它将尝试获取共享日志上的锁,但由于从机拥有它而无法获取。
由于 FailbackChecker
,从属设备未能设置 NullPointerException
线程,因为在任一 <cluster-connection>
中都没有配置 broker.xml
。这是一个无效配置。您必须配置一个 <cluster-connection>
,例如:
<cluster-connections>
<cluster-connection name="my-cluster">
<connector-ref>netty</connector-ref>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<discovery-group-ref discovery-group-name="teamsMQ-discovery-group"/>
</cluster-connection>
</cluster-connections>
由于 FailbackChecker
线程未运行,从站将不知道主站已重新启动并启动故障回复。因此,您需要停止从站,以便它可以放弃对共享日志的锁定。此时主代理将启动。请记住,任何连接到从站的客户端都将断开连接,并且必须重新连接到主站。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。