为什么我的反向传播算法的性能停滞不前?

如何解决为什么我的反向传播算法的性能停滞不前?

我正在学习如何编写神经网络,目前我正在研究一种具有一个输入层、一个隐藏层和一个输出层的反向传播算法。算法正在运行,当我抛出一些测试数据

x_train = np.array([[1.,2.,-3.,10.],[0.3,-7.8,1.,2.]])
y_train = np.array([[10,-3,6,1],[1,1,1]])

进入我的算法,使用3个隐藏单元的默认值和10e-4的默认学习率,

Backprop.train(x_train,y_train,tol = 10e-1)
x_pred = Backprop.predict(x_train),

我得到了很好的结果:

Tolerances: [10e-1,10e-2,10e-3,10e-4,10e-5]
Iterations: [2678,5255,7106,14270,38895]
Mean absolute error: [0.42540,0.14577,0.04264,0.01735,0.00773]
Sum of squared errors: [1.85383,0.21345,0.01882,0.00311,0.00071].

每次误差平方和都按照我的预期下降一位小数。但是,当我使用这样的测试数据时

X_train = np.random.rand(20,7)
Y_train = np.random.rand(20,2)
Tolerances: [10e+1,10e-0,10e-1,10e-3]
Iterations: [11,19,63,80,7931],Mean absolute error: [0.30322,0.25076,0.25292,0.24327,0.24255],Sum of squared errors: [4.69919,3.43997,3.50411,3.38170,3.16057],

没有什么真正改变。我检查了我的隐藏单元、梯度和权重矩阵,它们都不同,梯度确实像我设置的反向传播算法一样缩小

if ( np.sum(E_hidden**2) + np.sum(E_output**2) ) < tol: 
   learning = False,

其中 E_hidden 和 E_output 是我的梯度矩阵。我的问题是:怎么会在梯度缩小的情况下,某些数据的指标实际上保持不变,我该怎么办?

我的反向传播看起来像这样:

class Backprop:


    def sigmoid(r):
            return (1 + np.exp(-r)) ** (-1)

    def train(x_train,hidden_units = 3,learning_rate = 10e-4,tol = 10e-3):
        # We need y_train to be 2D. There should be as many rows as there are x_train vectors
        N = x_train.shape[0]
        I = x_train.shape[1]
        J = hidden_units 
        K = y_train.shape[1] # Number of output units

            # Add the bias units to x_train
        bias = -np.ones(N).reshape(-1,1) # Make it 2D so we can stack it
            # Make the row vector a column vector for easier use when applying matrices. Afterwards,x_train.shape = (N,I+1)
        x_train = np.hstack((x_train,bias)).T # x_train.shape = (I+1,N) -> N column vectors of respective length I+1
        
            # Create our weight matrices
        W_input = np.random.rand(J,I+1) # W_input.shape = (J,I+1)
        W_hidden = np.random.rand(K,J+1) # W_hidden.shape = (K,J+1)
        m = 0
        learning = True
        while learning:

            ##### ----- Phase 1: Forward Propagation ----- #####

                # Create the total input to the hidden units
            u_hidden = W_input @ x_train # u_hidden.shape = (J,N) -> N column vectors of respective length J. For every training vector we                                            # get J hidden states
                # Create the hidden units 
           
            h = Backprop.sigmoid(u_hidden) # h.shape = (J,N)
                # Create the total input to the output units
            
            bias = -np.ones(N)
            h = np.vstack((h,bias)) # h.shape = (J+1,N)
            u_output = W_hidden @ h # u_output.shape = (K,N). For every training vector we get K output states. 
                # In the code itself the following is not necessary,because,as we remember from the above,the output activation function
                # is the identity function,but let's do it anyway for the sake of clarity
            y_pred = u_output.copy() # Now,y_pred has the same shape as y_train
            
            
            ##### ----- Phase 2: Backward Propagation ----- #####

                # We will calculate the delta terms now and begin with the delta term of the output unit
                
                # We will transpose several times now. Before,having column vectors was convenient,because matrix multiplication is 
                # more intuitive then. But now,we need to work with indices and need the right dimensions. Yes,loops are inefficient,# they provide much more clarity so that we can easily connect the theory above with our code. 

                # We don't need the delta_output right now,because we will update W_hidden with a loop. But we need it for the delta term 
                # of the hidden unit.
            delta_output = y_pred.T - y_train 
                # Calculate our error gradient for the output units
            E_output = np.zeros((K,J+1))
            for k in range(K):
                for j in range(J+1):
                    for n in range(N):
                        E_output[k,j] += (y_pred.T[n,k] - y_train[n,k]) * h.T[n,j] 
                # Calculate our change in W_hidden
            W_delta_output = -learning_rate * E_output
                # Update the old weights
            W_hidden = W_hidden + W_delta_output

                # Let's calculate the delta term of the hidden unit
            delta_hidden = np.zeros((N,J+1))
            for n in range(N):
                for j in range(J+1):
                    for k in range(K):
                        delta_hidden[n,j] += h.T[n,j]*(1 - h.T[n,j]) * delta_output[n,k] * W_delta_output[k,j]

                # Calculate our error gradient for the hidden units,but exclude the hidden bias unit,because W_input and the hidden bias
                # unit don't share any relation at all
            E_hidden = np.zeros((J,I+1))
            for j in range(J):
                for i in range(I+1):
                    for n in range(N):
                        E_hidden[j,i] += delta_hidden[n,j]*x_train.T[n,i]
                # Calculate our change in W_input
            W_delta_hidden = -learning_rate * E_hidden
            W_input = W_input + W_delta_hidden
            
            if ( np.sum(E_hidden**2) + np.sum(E_output**2) ) < tol: 
               learning = False
            
            m += 1 # Iteration count
            
        Backprop.weights = [W_input,W_hidden]
        Backprop.iterations = m
        Backprop.errors = [E_hidden,E_output]


 ##### ----- #####


    def predict(x):
        N = x.shape[0]
            # x1 = Backprop.weights[1][:,:-1] @ Backprop.sigmoid(Backprop.weights[0][:,:-1] @ x.T) # Trying this we see we really need to add
            #  a bias here the bias if we also train using bias

            # Add the bias units to x
        bias = -np.ones(N).reshape(-1,1) # Make it 2D so we can stack it
            # Make the row vector a column vector for easier use when applying matrices.
        x = np.hstack((x,bias)).T
        h = Backprop.weights[0] @ x
        u = Backprop.sigmoid(h) # We need to transform the data using the sigmoidal function
        h = np.vstack((u,bias.reshape(1,-1)))

        return (Backprop.weights[1] @ h).T

解决方法

我找到了答案。如果在 Backprop.predict 中,我写

output = (Backprop.weights[1] @ h).T
    return output

与上述不同,一切正常。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)&gt; insert overwrite table dwd_trade_cart_add_inc &gt; select data.id, &gt; data.user_id, &gt; data.course_id, &gt; date_format(
错误1 hive (edu)&gt; insert into huanhuan values(1,&#39;haoge&#39;); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive&gt; show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 &lt;configuration&gt; &lt;property&gt; &lt;name&gt;yarn.nodemanager.res