如何解决权重返回零,而不是数字,用于反向传播和随机梯度下降
一段时间以来,我一直在努力解决这个问题。我正在尝试使用MNIST创建MLP,以便从Kaggle进行测试。
这是我的体重和激活功能的样子
# format dataset
x_vals = list(train.columns.drop(["label"]))
x_train = train[x_vals].values / 255
x_test = test[x_vals].values / 255
y_train = train.label.astype(str).str.get_dummies().replace({0: 0.1,1: 0.9}).values
y_test = test.label.astype(str).str.get_dummies().replace({0: 0.1,1: 0.9}).values
# weights range from -0.05 to 0.05
input_weights_w_ij = np.divide(np.random.rand(len(x_train[0]),n_hidden),20 * (-1) ** random.randint(0,1))
hidden_weights_w_jk = np.divide(np.random.rand(n_hidden + 1,10),20 * (-1) ** (random.randint(0,1)))
# activation function
def sigmoid(x,w):
z = np.dot(x,w)
return 1 / (1 + np.exp(-z))
这是我的随机梯度下降函数:
def gradient_descent(inputs,outputs,input_weights,hidden_weights,iterations):
delta_wjk = np.zeros(hidden_weights.shape)
delta_wij = np.zeros(input_weights.shape)
accuracy_array = []
for i in tqdm(range(iterations)):
for jj in range(inputs.shape[0]):
Xi = inputs[jj,:]
hidden = sigmoid(Xi,input_weights)
hidden = np.append(hidden,1)
yhat = sigmoid(hidden,hidden_weights)
# gradients for hidden to output weights
g_wjk = (outputs[jj,:] - yhat) * yhat*(1-yhat)
# gradients for input to hidden weights
g_wij = hidden*(1-hidden)*np.dot(g_wjk,hidden_weights.T)
delta_wjk = eta*np.dot(np.atleast_2d(hidden).T,np.atleast_2d(g_wjk)) + alpha * delta_wjk
delta_wij = eta*np.dot(np.atleast_2d(Xi).T,np.atleast_2d(g_wij[:-1])) + alpha * delta_wij
print('delta_wij',delta_wij)
# update weights
input_weights -= delta_wij
hidden_weights -= delta_wjk
accuracy_array.append((acc_score(x_train,y_train,hidden_weights)))
当我打印delta_wij时,显示的唯一内容是在最终的嵌套数组中。看起来像这样:
delta_wij [[ 0.00000000e+000 0.00000000e+000 0.00000000e+000 ... 0.00000000e+000
0.00000000e+000 0.00000000e+000]
[ 0.00000000e+000 0.00000000e+000 0.00000000e+000 ... 0.00000000e+000
0.00000000e+000 0.00000000e+000]
[ 0.00000000e+000 0.00000000e+000 0.00000000e+000 ... 0.00000000e+000
0.00000000e+000 0.00000000e+000]
...
[ 0.00000000e+000 0.00000000e+000 0.00000000e+000 ... 0.00000000e+000
0.00000000e+000 0.00000000e+000]
[ 0.00000000e+000 0.00000000e+000 0.00000000e+000 ... 0.00000000e+000
0.00000000e+000 0.00000000e+000]
[-4.58272825e-265 -4.41384094e-265 -4.29442019e-265 ... -3.77385114e-265
-4.47469287e-265 -4.03663845e-265]]
delta_wij [[ 0.00000000e+000 0.00000000e+000 0.00000000e+000 ... 0.00000000e+000
0.00000000e+000 0.00000000e+000]
[ 0.00000000e+000 0.00000000e+000 0.00000000e+000 ... 0.00000000e+000
0.00000000e+000 0.00000000e+000]
[ 0.00000000e+000 0.00000000e+000 0.00000000e+000 ... 0.00000000e+000
0.00000000e+000 0.00000000e+000]
...
[ 0.00000000e+000 0.00000000e+000 0.00000000e+000 ... 0.00000000e+000
0.00000000e+000 0.00000000e+000]
[ 0.00000000e+000 0.00000000e+000 0.00000000e+000 ... 0.00000000e+000
0.00000000e+000 0.00000000e+000]
[-4.12445543e-265 -3.97245684e-265 -3.86497817e-265 ... -3.39646603e-265
-4.02722358e-265 -3.63297460e-265]]
因此它将更新最后一个嵌套值,而不更新其他嵌套值。我可以看到我也在查看e ^ -265范围内的数字,但实际上其他嵌套数组都没有显示这些类型的值。我是否可以执行某些操作来强制这些数组显示这些值并使用这些值进行更新,或者有更好的方法呢?这似乎让我的准确性感到厌烦(每次运行迭代时,它都会返回相同的准确性)。任何帮助将不胜感激-谢谢!
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。