如何解决多层反向传播问题,损失越来越大
我仅使用 numpy 和 Pandas 包构建 6 层 MLP cifar10 数据,其中 softmax 激活作为输出的激活,其余层上的 sigmoid 具有交叉熵损失函数和 L2 正则化。我遇到了损失越来越大的问题,最终变成了 nan。我相信我在反向传播函数上做错了一些事情。我在这个 mlp 上做错了什么?
输入 x 形状为 9000 x 3072
初始重量
权重['w1'] = np.random.rand(X_train.shape[1],256)* np.sqrt(2/256)
权重['w2'] = np.random.rand(256,256)* np.sqrt(2/256)
权重['w3']= np.random.rand(256,256)* np.sqrt(2/256)
权重['w4']= np.random.rand(256,256)* np.sqrt(2/256)
权重['w5']= np.random.rand(256,10)* np.sqrt(2/10)
前馈
def Feedforward(x,weights):
param ={}
param['a0'] = x
param['z1'] = np.dot(param['a0'],weights['w1'])
param['a1'] = sigmoid(param['z1'])
param['z2']= np.dot(param['a1'],weights['w2'])
param['a2'] = sigmoid(param['z2'])
param['z3']= np.dot(param['a2'],weights['w3'])
param['a3'] = sigmoid(param['z3'])
param['z4']= np.dot(param['a3'],weights['w4'])
param['a4'] = sigmoid(param['z2'])
param['z5']= np.dot(param['a4'],weights['w5'])
param['a5'] = softmax(param['z5'])
return param
cross_entry_L2_regularization
def cross_entropy(output,t):
m = t.shape[1]
logprobs = np.multiply(t,np.log(output)) + np.multiply((1 - t),np.log(1 - output))
entropycost = (-1.0/m) * np.sum(logprobs)
# Compute L2 regularization cost
l2cost = (np.sum(np.square(weights['w1'])) + np.sum(np.square(weights['w2']))
+ np.sum(np.square(weights['w3']))+ np.sum(np.square(weights['w4']))+ np.sum(np.square(weights['w5'])))*(lambd/(2*m))
# add cross_entropy_cost and L2_regularization_cost
cost = entropycost + l2cost
return np.squeeze(cost)
反向传播
def Gradient(param,weights,t):
m = t.shape[1]
weight_update ={}
error = t - param['a5']
weight_update['dldw5'] = np.dot(param['a4'].T,error) + (lambd/m)*weights['w5']
error = np.multiply(sigmoid_der(param['a4']),np.dot(error,weights['w5'].T))
weight_update['dldw4'] = np.dot(param['a3'].T,error) + (lambd/m)*weights['w4']
error = np.multiply(sigmoid_der(param['a3']),weights['w4'].T))
weight_update['dldw3'] = np.dot(param['a2'].T,error) + (lambd/m)*weights['w3']
error = np.multiply(sigmoid_der(param['a2']),weights['w3'].T))
weight_update['dldw2'] = np.dot(param['a1'].T,error) + (lambd/m)*weights['w2']
error = np.multiply(sigmoid_der(param['a1']),weights['w2'].T))
weight_update['dldw1'] = np.dot(param['a0'].T,error) + (lambd/m)*weights['w1']
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。