如何解决Corda 节点在 Artemis MessagingClient 失败后崩溃,“Artemis MessagingClient 失败正在关闭”
使用 CordaOSS4.3 运行 2 个节点和一个 notary 时出现以下错误(Amazon EFS 用于每个节点和 notary 的 Artemis 服务)。
・节点A
[INFO ] 2021-03-24T01:53:33,526Z [nioEventLoopGroup-2-1] engine.ConnectionStateMachine. - Transport Error TransportImpl [_connectionEndpoint=org.apache.qpid.proton.engine.impl.ConnectionImpl@d8755f,org.apache.qpid.proton.engine.impl.TransportImpl@720cb721] {localLegalName=O=nodeA,L=Local,C=JP,remoteLegalName=O=nodeB,serverMode=false}
[INFO ] 2021-03-24T01:53:33,526Z [nioEventLoopGroup-2-1] engine.ConnectionStateMachine. - Error: connection aborted {localLegalName=O=nodeA,527Z [nioEventLoopGroup-2-1] netty.AMQPClient. - Disconnected from [NLBendpoint]:10005
[INFO ] 2021-03-24T01:53:33,527Z [nioEventLoopGroup-2-1] netty.AMQPChannelHandler. - Closed client connection 828af8c0 from [NLBendpoint]:10005 to /xx.xx.x.xx:40438 {allowedRemoteLegalNames=O=nodeB,localCert=O=nodeA,remoteAddress=[NLBendpoint]:10005,remoteCert=O=nodeB,527Z [nioEventLoopGroup-2-1] bridging.AMQPBridgeManager$AMQPBridge. - Bridge Disconnected {legalNames=O=nodeB,maxMessageSize=10485760,queueName=internal.peers.DLB29JcZp4kCP2aGGZKGkhw2X5RenndTjEK4xy48iT9643,targets=[NLBendpoint]:10005}
[WARN ] 2021-03-24T01:55:59,747Z [Thread-17936 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5@2936f48a)] core.client. - AMQ212037: Connection failure has been detected: AMQ119014: Did not receive data from /xxx.0.0.1:53166 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
[WARN ] 2021-03-24T01:55:59,748Z [Thread-949 (ActiveMQ-client-global-threads)] core.client. - AMQ212037: Connection failure has been detected: AMQ119011: Did not receive data from server for org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection@6eb6efd1[ID=e834052e,local= /127.0.0.1:53170,remote=localhost/127.0.0.1:10008] [code=CONNECTION_TIMEDOUT]
[WARN ] 2021-03-24T01:55:59,751Z [Thread-948 (ActiveMQ-client-global-threads)] core.client. - AMQ212037: Connection failure has been detected: AMQ119011: Did not receive data from server for org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection@505dd5b8[ID=f1885302,local= /127.0.0.1:53166,751Z [Thread-950 (ActiveMQ-client-global-threads)] core.client. - AMQ212037: Connection failure has been detected: AMQ119011: Did not receive data from server for org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection@57579387[ID=718e48b8,local= /127.0.0.1:53168,774Z [nioEventLoopGroup-2-1] netty.AMQPChannelHandler. - Closing channel due to nonrecoverable exception AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 68 {allowedRemoteLegalNames=O=nodeB,serverMode=false}
[INFO ] 2021-03-24T01:55:59,775Z [nioEventLoopGroup-2-1] netty.AMQPClient. - Retry connect to [NLBendpoint]:10005
[ERROR] 2021-03-24T01:55:59,779Z [Thread-612] errorAndTerminate. - ArtemisMessagingClient failed. Shutting down.
・公证员
[INFO ] 2021-03-24T01:53:34,850Z [nioEventLoopGroup-2-4] engine.ConnectionStateMachine. - Transport Error TransportImpl [_connectionEndpoint=org.apache.qpid.proton.engine.impl.ConnectionImpl@1a1be565,org.apache.qpid.proton.engine.impl.TransportImpl@1e6940e2] {localLegalName=O=Notary1,remoteLegalName=O=nodeA,serverMode=false}
[INFO ] 2021-03-24T01:53:34,850Z [nioEventLoopGroup-2-4] engine.ConnectionStateMachine. - Error: connection aborted {localLegalName=O=Notary1,851Z [nioEventLoopGroup-2-4] netty.AMQPClient. - Disconnected from [NLBendpoint]:10008
[INFO ] 2021-03-24T01:53:34,851Z [nioEventLoopGroup-2-4] netty.AMQPChannelHandler. - Closed client connection 9da3b393 from [NLBendpoint]:10008 to /xx.xx.x.xx:33438 {allowedRemoteLegalNames=O=nodeA,localCert=O=Notary1,remoteAddress=[NLBendpoint]:10008,remoteCert=O=nodeA,851Z [nioEventLoopGroup-2-4] bridging.AMQPBridgeManager$AMQPBridge. - Bridge Disconnected {legalNames=O=nodeA,queueName=internal.peers.DLHVntq87Ai3vLSuQzG8BoKcc2napU6aU3NPVFwiF73322,targets=[NLBendpoint]:10008}
[INFO ] 2021-03-24T01:54:03,123Z [nioEventLoopGroup-2-3] netty.AMQPClient. - Retry connect to [NLBendpoint]:10005
[WARN ] 2021-03-24T01:54:17,939Z [nioEventLoopGroup-2-2] netty.AMQPChannelHandler. - SSL Handshake timed out {allowedRemoteLegalNames=O=nodeA,localCert=null,remoteCert=null,serverMode=false}
[ERROR] 2021-03-24T01:54:17,939Z [nioEventLoopGroup-2-2] netty.AMQPChannelHandler. - Handshake failure handshake timed out {allowedRemoteLegalNames=O=nodeA,serverMode=false}
[INFO ] 2021-03-24T01:56:11,385Z [nioEventLoopGroup-2-2] netty.AMQPClient. - Retry connect to [NLBendpoint]:10005
[INFO ] 2021-03-24T01:56:11,392Z [nioEventLoopGroup-2-3] netty.AMQPClient. - Failed to connect to [NLBendpoint]:10005
[INFO ] 2021-03-24T01:56:13,393Z [nioEventLoopGroup-2-4] netty.AMQPClient. - Retry connect to [NLBendpoint]:10005
[INFO ] 2021-03-24T01:56:13,398Z [nioEventLoopGroup-2-1] netty.AMQPClient. - Failed to connect to [NLBendpoint]:10005
输出这些日志后,nodeA进程宕机了。 (公证过程仍在运行) 这个问题的原因可能是什么? 我怀疑与 Artemis 服务的连接由于连接到 Amazon EFS 的某些问题而丢失,因为这些是在操作系统日志中输出的。
Mar 24 10:55:51 [serverName] stunnel: LOG5[4]: Connection reset: 1105153036 byte(s) sent to TLS,839120060 byte(s) sent to socket
Mar 24 10:55:54 [serverName] stunnel: LOG5[5]: Service [efs] accepted connection from xxx.x.x.x:38710
Mar 24 10:55:54 [serverName] stunnel: LOG5[5]: s_connect: connected xx.xx.x.xx:2049
Mar 24 10:55:54 [serverName] stunnel: LOG5[5]: Service [efs] connected remote server from xx.xx.x.xx:51468
Mar 24 10:55:55 [serverName] stunnel: LOG5[5]: Certificate accepted at depth=0: CN=*.efs.ap-northeast-1.amazonaws.com
Mar 24 10:55:55 [serverName] stunnel: LOG3[5]: transfer: s_poll_wait: TIMEOUTclose exceeded: closing
Mar 24 10:55:55 [serverName] stunnel: LOG5[5]: Connection closed: 0 byte(s) sent to TLS,0 byte(s) sent to socket
Mar 24 10:55:55 [serverName] stunnel: LOG5[6]: Service [efs] accepted connection from xxx.x.x.x:38716
Mar 24 10:55:55 [serverName] stunnel: LOG5[6]: s_connect: connected xx.xx.x.xx2049
Mar 24 10:55:55 [serverName] stunnel: LOG5[6]: Service [efs] connected remote server from xx.xx.x.xx:51474
解决方法
我相信我们在 slack 上讨论过这个问题,但是如果你启动了一个 corda 节点并且它不能绑定到 p2p 端口或 p2pAddress。这可能会导致像您描述的那样出现 artemis 错误。
您的网络安全组中也可能发生了一些奇怪的事情。确保您能够在本地计算机上运行此功能,并且所有节点都可以在您期望的端口上相互 ping/telnet。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。