网络值通过线性层变为0

如何解决网络值通过线性层变为0

我设计了图注意力网络。
但是，在图层内部进行操作时，要素的值将变为相等。

class GraphAttentionLayer(nn.Module):
    ## in_features = out_features = 1024
    def __init__(self,in_features,out_features,dropout):
        super(GraphAttentionLayer,self).__init__()
        self.dropout = dropout
        self.in_features = in_features
        self.out_features = out_features
   
        self.W = nn.Parameter(torch.zeros(size=(in_features,out_features)))
        self.a1 = nn.Parameter(torch.zeros(size=(out_features,1)))
        self.a2 = nn.Parameter(torch.zeros(size=(out_features,1)))
        nn.init.xavier_normal_(self.W.data,gain=1.414)
        nn.init.xavier_normal_(self.a1.data,gain=1.414)
        nn.init.xavier_normal_(self.a2.data,gain=1.414)
        self.leakyrelu = nn.LeakyReLU()

    def forward(self,input,adj):
        h = torch.mm(input,self.W)
        a_input1 = torch.mm(h,self.a1)
        a_input2 = torch.mm(h,self.a2)
        a_input = torch.mm(a_input1,a_input2.transpose(1,0))
        e = self.leakyrelu(a_input)

        zero_vec = torch.zeros_like(e)
        attention = torch.where(adj > 0,e,zero_vec) # most of values is close to 0
        attention = F.softmax(attention,dim=1) # all values are 0.0014 which is 1/707 (707^2 is the dimension of attention)
        attention = F.dropout(attention,self.dropout)
        return attention

“注意力”的维度是（707 x 707），我观察到在softmax之前，注意力值接近于0。
在softmax之后，所有值均为0.0014，即1/707。
我想知道如何使值保持规范化并防止这种情况。

谢谢

解决方法

由于您说的是在培训期间发生的，所以我认为这是一开始的事情。使用随机初始化，在训练过程开始时，您经常会在网络末端获得接近相同的值。

当所有值或多或少相等时，每个元素的softmax输出将为1/num_elements，因此在您选择的尺寸上它们的总和为1。因此，在您的情况下，您将得到1/707作为所有值，这对我来说听起来像是您的权重是新初始化的，并且在此阶段输出大部分是随机的。

我会让它训练一会儿，观察它是否会改变。