微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

多层反向传播问题,损失越来越大

如何解决多层反向传播问题,损失越来越大

我仅使用 numpy 和 Pandas 包构建 6 层 MLP cifar10 数据,其中 softmax 激活作为输出的激活,其余层上的 sigmoid 具有交叉熵损失函数和 L2 正则化。我遇到了损失越来越大的问题,最终变成了 nan。我相信我在反向传播函数上做错了一些事情。我在这个 mlp 上做错了什么?

输入 x 形状为 9000 x 3072

初始重量

权重['w1'] = np.random.rand(X_train.shape[1],256)* np.sqrt(2/256)

权重['w2'] = np.random.rand(256,256)* np.sqrt(2/256)

权重['w3']= np.random.rand(256,256)* np.sqrt(2/256)

权重['w4']= np.random.rand(256,256)* np.sqrt(2/256)

权重['w5']= np.random.rand(256,10)* np.sqrt(2/10)

前馈

def Feedforward(x,weights):
param ={}
param['a0'] = x
param['z1'] = np.dot(param['a0'],weights['w1'])
param['a1'] = sigmoid(param['z1'])

param['z2']= np.dot(param['a1'],weights['w2'])
param['a2'] = sigmoid(param['z2'])

param['z3']= np.dot(param['a2'],weights['w3'])
param['a3'] = sigmoid(param['z3'])

param['z4']= np.dot(param['a3'],weights['w4'])
param['a4'] = sigmoid(param['z2'])

param['z5']= np.dot(param['a4'],weights['w5'])
param['a5'] = softmax(param['z5'])

return param

cross_entry_L2_regularization

def cross_entropy(output,t):
m = t.shape[1]
logprobs = np.multiply(t,np.log(output)) + np.multiply((1 - t),np.log(1 - output))
entropycost = (-1.0/m) * np.sum(logprobs)

# Compute L2 regularization cost
l2cost = (np.sum(np.square(weights['w1'])) + np.sum(np.square(weights['w2']))
+ np.sum(np.square(weights['w3']))+ np.sum(np.square(weights['w4']))+ np.sum(np.square(weights['w5'])))*(lambd/(2*m))

# add cross_entropy_cost and L2_regularization_cost
cost = entropycost + l2cost

return np.squeeze(cost)

反向传播

def Gradient(param,weights,t):
m = t.shape[1]

weight_update ={}

error = t - param['a5'] 
weight_update['dldw5'] = np.dot(param['a4'].T,error)  + (lambd/m)*weights['w5']

error = np.multiply(sigmoid_der(param['a4']),np.dot(error,weights['w5'].T))
weight_update['dldw4'] = np.dot(param['a3'].T,error)   + (lambd/m)*weights['w4']

error = np.multiply(sigmoid_der(param['a3']),weights['w4'].T))
weight_update['dldw3'] = np.dot(param['a2'].T,error)   + (lambd/m)*weights['w3']

error = np.multiply(sigmoid_der(param['a2']),weights['w3'].T))
weight_update['dldw2'] = np.dot(param['a1'].T,error)   + (lambd/m)*weights['w2']

error = np.multiply(sigmoid_der(param['a1']),weights['w2'].T))
weight_update['dldw1'] = np.dot(param['a0'].T,error)   + (lambd/m)*weights['w1']

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。