分类问题的神经网络最后一层的输出向量停留在 0.5

如何解决分类问题的神经网络最后一层的输出向量停留在 0.5

输出层卡在 [0.5,0.5] 向量处。任何人都可以帮助理解代码是否有任何问题。

我试图训练的神经网络是一个异或门,所以在这种情况下,输出向量应该接近代表正确类(0 或 1)的一个热向量,但毕竟是输出向量epoch 仍然停留在 [0.5,0.5]

class Backpropogation:

    def setupWeightsBiases(self):
        for i in range(1,self.num_layers):
            self.weights_dict[i] = rnd.rand(self.layer_spec[i],self.layer_spec[i - 1])
            self.bias_dict[i] = rnd.rand(self.layer_spec[i],1)

    def __init__(self,hidden_layer_neurons_tuple,train_data,num_output_classes,output_layer_func='sigmoid'):
        self.train_input = train_data[0]
        self.input_layer_size = self.train_input[0].size

        self.train_input = self.train_input.reshape(self.train_input.shape[0],self.input_layer_size).T

        self.output_layer_size = num_output_classes
        self.train_output = train_data[1]
        print(self.train_output.shape)

        num_hidden_layer = len(hidden_layer_neurons_tuple)
        self.hidden_layer_neurons_tuple = hidden_layer_neurons_tuple
        self.layer_spec = [self.input_layer_size] + \
                          list(hidden_layer_neurons_tuple) + \
                          [num_output_classes]
        self.layer_spec = tuple(self.layer_spec)

        self.num_layers = num_hidden_layer + 2
        self.train_data = train_data
        self.activation_layer_gradient_dict = {}
        self.preactivation_layer_gradient_dict = {}
        self.weights_gradient_dict = {}
        self.bias_gradient_dict = {}
        self.curr_input = None
        self.curr_output = None
        self.weights_dict = {}
        self.preactivation_layer_dict = {}
        self.activation_layer_dict = {}
        self.bias_dict = {}
        self.setupWeightsBiases()
        self.output = None
        self.output_diff = None
        self.num_output_classes = num_output_classes

    def predictClass(self):
        return np.argmax(self.activation_layer_dict[self.num_layers - 1])

    def forwardPropogation(self,input):
        # Load h[0] as the input data
        self.activation_layer_dict[0] = input

        '''
        load input data into h[0]
        for i in (1,L):
            a[k] = W[k] * h[k-1] + b[k]
        and finally calculate the Lth layer output with the special activation function
        '''
        for i in range(1,self.num_layers):
            self.preactivation_layer_dict[i] = \
                np.matmul(self.weights_dict[i],self.activation_layer_dict[i - 1]) + \
                self.bias_dict[i]
            # print(self.preactivation_layer_dict[i])
            vec = self.preactivation_layer_dict[i]
            self.activation_layer_dict[i] = self.activationFunction(vec)
            # This will change h[L] to y'
        self.activation_layer_dict[self.num_layers - 1] = self.outputFunction()

    def findGradients(self,index):
        class_label = self.train_output[index]
        output_one_hot_vector = np.zeros((self.num_output_classes,1))
        output_one_hot_vector[class_label] = 1
        output = self.activation_layer_dict[self.num_layers - 1]
        self.preactivation_layer_gradient_dict[self.num_layers - 1] = -1 * (output_one_hot_vector - output)

        for layer in reversed(range(1,self.num_layers)):
            self.weights_gradient_dict[layer] = np.matmul(self.preactivation_layer_gradient_dict[layer],self.activation_layer_dict[layer - 1].T)

            self.bias_gradient_dict[layer] = self.preactivation_layer_gradient_dict[layer]

            self.activation_layer_gradient_dict[layer - 1] = np.matmul(self.weights_dict[layer].T,self.preactivation_layer_gradient_dict[layer])

            if layer != 1:
                self.preactivation_layer_gradient_dict[layer - 1] = np.multiply(
                    self.activation_layer_gradient_dict[layer - 1],self.outputFunctionDiff(layer - 1))

    def activationFunction(self,vec,type='sigmoid'):

        if type == 'sigmoid':
            return 1 / (1 + expit(-vec))
        else:
            print('Please select correct output function')
            exit()

    def outputFunction(self,type='sigmoid'):
        if type == 'sigmoid':
            return 1 / (1 + expit(-self.preactivation_layer_dict[self.num_layers - 1]))
        else:
            print('Please select correct output function')
            exit()

    def outputFunctionDiff(self,layer,type='sigmoid'):
        op_layer = self.num_layers - 1
        if type == 'sigmoid':
            vec = self.preactivation_layer_dict[layer]
            return np.multiply(self.activationFunction(vec),1 - self.activationFunction(vec))

        else:
            print('Please select correct output function')
            exit()

    def updateWeightsAndBiases(self,learning_rate):
        for layer in range(1,self.num_layers):
            self.weights_dict[layer] = self.weights_dict[layer] - learning_rate * self.weights_gradient_dict[layer]

            self.preactivation_layer_dict[layer] = self.preactivation_layer_dict[layer] - \
                                                   learning_rate * self.preactivation_layer_gradient_dict[layer]

            if not (layer == self.num_layers - 1):
                self.activation_layer_dict[layer] = self.activation_layer_dict[layer] - \
                                                    learning_rate * self.activation_layer_gradient_dict[layer]

            self.bias_dict[layer] = self.bias_dict[layer] - learning_rate * self.bias_gradient_dict[layer]

    def getLoss(self,index):
      return np.log2(self.activation_layer_dict[self.num_layers - 1][self.train_output[index],0])

    def train(self,learning_rate,num_epochs):
        for curr_epoch in range(num_epochs):
            print('Evaluating at ' + str(curr_epoch))
            index_array = list(np.arange(0,self.train_input.shape[1]))
            np.random.shuffle(index_array)
            for train_data_index in index_array:
                test_input = self.train_input[:,[train_data_index]]
                self.forwardPropogation(test_input)
                # print(self.activation_layer_dict[self.num_layers - 1])
                self.findGradients(train_data_index)
                self.updateWeightsAndBiases(learning_rate)
            print('Loss ' + str(self.getLoss(train_data_index)))

    # Assumes a 2D array of 784xN array as test input
    # This will return output classes of the data
    def test(self,test_data):
        index_range = test_data.shape[1]
        test_class_list = []
        for index in range(index_range):
            self.forwardPropogation(test_data[:,[index]])
            test_class_list.append(self.predictClass())
        return test_class_list

    # train the NN with BP
    train_data = (np.array([[0,0],[0,1],[1,1]]),np.array([0,1,0]))

    b = Backpropogation((2,2),2)

解决方法

以下代码(检查 this 的实现和 this 的理论)从头开始实现了一个带有反向传播的神经网络,使用带有 sigmoid 激活的单个输出单元(否则它看起来类似于您的实现).

使用这个,可以用适当的学习率和时期来学习 XOR 函数(虽然它有时会卡在局部最小值,你可以考虑实现 drop-out 等正则化器)。此外,您可以将其转换为您的 2 输出(softmax?)版本,您能找出实现中的任何问题吗?例如,您可以查看以下指针:

  • 在反向传播期间批量更新参数而不是随机更新
  • 运行足够多的 epoch
  • 改变学习率
  • 对隐藏层使用 Relu 激活而不是 sigmoid(以应对消失的梯度) 等

from sklearn.metrics import accuracy_score,mean_squared_error

class FFSNNetwork:
  
  def __init__(self,n_inputs,hidden_sizes=[2]):
    #intialize the inputs
    self.nx = n_inputs
    self.ny = 1  # number of neurons in the output layer
    self.nh = len(hidden_sizes)
    self.sizes = [self.nx] + hidden_sizes + [self.ny]
    
    self.W = {}
    self.B = {}
    for i in range(self.nh+1): 
        self.W[i+1] = np.random.rand(self.sizes[i],self.sizes[i+1])
        self.B[i+1] = np.random.rand(1,self.sizes[i+1])

  def sigmoid(self,x):
    return 1.0/(1.0 + np.exp(-x))
  
  def forward_pass(self,x):
    self.A = {}
    self.H = {}
    self.H[0] = x.reshape(1,-1)
    for i in range(self.nh+1):
      self.A[i+1] = np.matmul(self.H[i],self.W[i+1]) + self.B[i+1]
      self.H[i+1] = self.sigmoid(self.A[i+1]) 
    return self.H[self.nh+1]
  
  def grad_sigmoid(self,x):
    return x*(1-x) 

  def grad(self,x,y):
    self.forward_pass(x)
    self.dW = {}
    self.dB = {}
    self.dH = {}
    self.dA = {}
    L = self.nh + 1
    self.dA[L] = (self.H[L] - y)
    for k in range(L,-1):
      self.dW[k] = np.matmul(self.H[k-1].T,self.dA[k])
      self.dB[k] = self.dA[k]
      self.dH[k-1] = np.matmul(self.dA[k],self.W[k].T)
      self.dA[k-1] = np.multiply(self.dH[k-1],self.grad_sigmoid(self.H[k-1])) 
    
  def fit(self,X,Y,epochs=1,learning_rate=1,initialize=True):
    
    # initialize w,b
    if initialize:
      for i in range(self.nh+1):
        self.W[i+1] = np.random.randn(self.sizes[i],self.sizes[i+1])
        self.B[i+1] = np.zeros((1,self.sizes[i+1]))
      
    for e in range(epochs):
      dW = {}
      dB = {}
      for i in range(self.nh+1):
        dW[i+1] = np.zeros((self.sizes[i],self.sizes[i+1]))
        dB[i+1] = np.zeros((1,self.sizes[i+1]))
      for x,y in zip(X,Y):
        self.grad(x,y)
        for i in range(self.nh+1):
          dW[i+1] += self.dW[i+1]
          dB[i+1] += self.dB[i+1]
        
      m = X.shape[1]
      for i in range(self.nh+1):
        self.W[i+1] -= learning_rate * dW[i+1] / m
        self.B[i+1] -= learning_rate * dB[i+1] / m
      
      Y_pred = self.predict(X)
      print('loss at epoch {} = {}'.format(e,mean_squared_error(Y_pred,Y)))
    
  def predict(self,X):
    Y_pred = []
    for x in X:
      y_pred = self.forward_pass(x)
      Y_pred.append(y_pred)
    return np.array(Y_pred).squeeze()

现在,训练网络:

#train the network with two hidden layers - 2 neurons and 2 neurons
ffsnn = FFSNNetwork(2,[2,2])
# XOR data
X_train,y_train = np.array([[0,0],[0,1],[1,1]]),np.array([0,1,0])
ffsnn.fit(X_train,y_train,epochs=5000,learning_rate=.15)

接下来,使用网络进行预测:

y_pred_prob = ffsnn.predict(X_train) # P(y = 1)
y_pred = (y_pred_prob >= 0.5).astype("int").ravel() # threshold = 0.5

X_train
# array([[0,1]])
y_train
# array([0,0])
y_pred_prob
# array([0.00803102,0.99439243,0.99097831,0.00664639])
y_pred
# array([0,0])
accuracy_score(y_train,y_pred)
# 1.0

请注意,这里使用真实和预测 y 值之间的 MSE 来绘制损失函数,您也可以绘制 BCE(交叉熵)损失函数。

最后,以下动画展示了如何最小化损失函数以及如何学习决策边界:

enter image description here

enter image description here

注意,绿色和红色点分别代表正(标签为 1)和负(标签为 0)训练数据点,在上面的动画中,注意它们在最后阶段是如何与决策边界分开的训练时期(与 XOR 对应的负数据点的较暗区域和正数据点的较亮区域)。

enter image description here

您可以使用高级深度学习库(例如 keras)通过几行代码实现相同的功能:

import tensorflow as tf
from tensorflow import keras

inputs = keras.Input(shape=(2,),name="in")
x = layers.Dense(4,activation="relu",name="dense_1")(inputs)
x = layers.Dense(4,name="dense_2")(x)
outputs = layers.Dense(1,activation="sigmoid",name="out")(x)

model = keras.Model(inputs=inputs,outputs=outputs)
X_train,0])
model.compile(
    optimizer=keras.optimizers.Adam(),# Optimizer
    # Loss function to minimize
    loss=tf.keras.losses.BinaryCrossentropy(),# List of metrics to monitor
    metrics=[keras.metrics.BinaryAccuracy(name="accuracy")],)

print("Fit model on training data")
history = model.fit(
    X_train,batch_size=4,epochs=1000)
# ...
# Epoch 371/1000
# 4/4 [==============================] - 0s 500us/sample - loss: 0.5178 - accuracy: 0.7500
# Epoch 372/1000
# 4/4 [==============================] - 0s 499us/sample - loss: 0.5169 - accuracy: 0.7500
# Epoch 373/1000
# 4/4 [==============================] - 0s 499us/sample - loss: 0.5160 - accuracy: 1.0000
# Epoch 374/1000
# 4/4 [==============================] - 0s 499us/sample - loss: 0.5150 - accuracy: 1.0000
# ...

print("Evaluate")
results = model.evaluate(X_train,batch_size=4)
print("loss,acc:",results)
# loss,acc: [0.1260240525007248,1.0]

下图显示了训练时期的损失/准确率。

enter image description here

最后,使用 kerassoftmax(而不是 sigmoid):

from keras.utils import to_categorical
X_train,0])
y_train = to_categorical(y_train,num_classes=2)
inputs = keras.Input(shape=(2,name="dense_2")(x)
outputs = layers.Dense(2,activation="softmax",outputs=outputs)
model.compile(
    optimizer='rmsprop',loss='categorical_crossentropy',metrics=['acc']
)
print("Fit model on training data")
history = model.fit(
    X_train,epochs=2000)
# Epoch 663/2000
# 4/4 [==============================] - 0s 500us/sample - loss: 0.3893 - acc: 0.7500
# Epoch 664/2000
# 4/4 [==============================] - 0s 500us/sample - loss: 0.3888 - acc: 1.0000
# Epoch 665/2000
# 4/4 [==============================] - 0s 500us/sample - loss: 0.3878 - acc: 1.0000
print("Evaluate")
results = model.evaluate(X_train,acc: [0.014970880933105946,1.0]

具有以下损失/精度收敛:

enter image description here

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams['font.sans-serif'] = ['SimHei'] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -> systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping("/hires") public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate<String
使用vite构建项目报错 C:\Users\ychen\work>npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)> insert overwrite table dwd_trade_cart_add_inc > select data.id, > data.user_id, > data.course_id, > date_format(
错误1 hive (edu)> insert into huanhuan values(1,'haoge'); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive> show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 <configuration> <property> <name>yarn.nodemanager.res