从一开始就增加验证损失

如何解决从一开始就增加验证损失

我一直在用机器学习做一个非常简单的二元猫/狗分类项目。我理解过度拟合的问题,但对我来说奇怪的是验证损失从一开始就开始上升。我尝试了许多不同的超参数集,包括 L2 正则化、学习率衰减和随机梯度下降,以及大型训练集,但问题仍然存在。这是其中一项试验的学习图(横轴应为每 10 个时期):

Learning graph 1

超参数为:两个隐藏层,分别为 50 和 10 个单位,初始 alpha = 0.05,alpha 衰减率 = 0.95 每 50 个 epoch,小批量大小 = 64,lambda = 0.05

以下是其他示例学习图:

Learning graph 2

Learning graph 3

我根据 Andrew Ng 的 Deep Learning Specialization 中提供的内容开发了我的模型,所以我没想到会有很多错误。根据需要,我的完整代码附在下面:

import numpy as np
import matplotlib.pyplot as plt
import os
import cv2
from scipy import special

#Data Preprocessing (the same for dev set,which I omit here)
path = '/Users/bobby/Downloads/kagglecatsanddogs_3367a/PetImages'
train_set = []
img_size = 80
categories = ['dogs_train','cats_train']
epsilon = 1e-8

for category in categories:
    path_animal = os.path.join(path,category)
    for img in os.listdir(path_animal):
        try:
            img_array = cv2.imread(os.path.join(path_animal,img),cv2.IMREAD_GRAYSCALE)
            new_img_array = cv2.resize(img_array,(img_size,img_size))
            flattened_img_array = new_img_array.reshape(img_size*img_size)
            train_set.append([flattened_img_array,categories.index(category)])
        except:
            continue

import random
random.shuffle(train_set)

X_train = []
Y_train = []
for sample in train_set:
    X_train.append(sample[0])
    Y_train.append(sample[1])

X_train = (np.array(X_train).T)/255
Y_train = np.array(Y_train).reshape((1,np.array(Y_train).shape[0]))

def create_mini_batches(X,Y,mini_batch_size):
    m = X.shape[1]
    mini_batches = []
    num_mini_batches = m // mini_batch_size
    
    permutation = list(np.random.permutation(m))
    shuffled_X = X[:,permutation]
    shuffled_Y = Y[:,permutation]
    
    for i in range(num_mini_batches):
        select_X = shuffled_X[:,mini_batch_size*i : mini_batch_size*(i+1)]
        select_Y = shuffled_Y[:,mini_batch_size*i : mini_batch_size*(i+1)]
        mini_batch = (select_X,select_Y)
        mini_batches.append(mini_batch)
    
    if m % mini_batch_size != 0:
        last_X = shuffled_X[:,mini_batch_size*num_mini_batches:m]
        last_Y = shuffled_Y[:,mini_batch_size*num_mini_batches:m]
        last_mini_batch = (last_X,last_Y)
        mini_batches.append(last_mini_batch)
        
    return mini_batches

def initialize_parameters(layers_dims): 
    L = len(layers_dims) # number of layers (including input layer),in this case L=4.
    parameters = {}
    for l in range(1,L): # range(1,4).
        parameters['W' + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1]) * np.sqrt(2/layers_dims[l-1])
        parameters['b' + str(l)] = np.zeros((layers_dims[l],1))
    return parameters

def sigmoid(Z):
    A = special.expit(Z)
    return A,Z

def relu(Z):
    A = np.maximum(0.01*Z,Z)
    return A,Z

def forward_propagation(X,parameters):

    caches = [] #list containing Z for every node
    A = X
    L = int(len(parameters)/2)
    
    for l in range(1,L):
        A_prev = A
        W = parameters['W'+str(l)]
        b = parameters['b'+str(l)]
        Z = np.dot(W,A_prev) + b
        A,activation_cache = relu(Z) #activation_cache contains z[l].
        linear_cache = (A_prev,W,b) #linear_cache contains A[l-1],W[l],b[l].
        cache = (linear_cache,activation_cache)
        caches.append(cache)
    
    W = parameters['W'+str(L)]
    b = parameters['b'+str(L)]
    Z = np.dot(W,A) + b
    AL,activation_cache = sigmoid(Z)
    linear_cache = (A,b)
    cache = (linear_cache,activation_cache)
    caches.append(cache)
    
    return AL,caches

def compute_cost(AL,parameters,lambd):
    m = Y.shape[1] # number of examples
    L = int(len(parameters)/2) #[6400,100,20,1] L=3 (0,1,2)
    reg_cost = 0
    
    for l in range(L):
        W = parameters['W' + str(l+1)]
        reg_cost += np.sum(np.square(W))
        
    J = (-1/m)*(np.sum(Y*np.log(AL+epsilon)+(1-Y)*np.log(1-AL+epsilon))) + (1/m) * (lambd/2) * reg_cost
    J = np.squeeze(J)
    return J

def linear_backward(dZ,linear_cache,lambd):
    A_prev,b = linear_cache
    m = A_prev.shape[1]
    
    dW = (1/m) * np.dot(dZ,A_prev.T) + (lambd/m)*W
    db = (1/m) * np.sum(dZ,axis=1,keepdims=True)
    dA_prev = np.dot(W.T,dZ)
    
    return dA_prev,dW,db

def relu_gradient(Z):
    dZ = np.where(Z > 0,0.01) 
    return dZ

def sigmoid_gradient(Z):
    dZ = special.expit(Z)*(1-special.expit(Z))
    return dZ

def linear_activation_backward(dA,cache,lambd,A,activation):
    linear_cache,activation_cache = cache
    
    if activation == 'relu':
        dZ = dA * relu_gradient(activation_cache)
        dA_prev,db = linear_backward(dZ,lambd)
        
    elif activation == 'sigmoid':
        dZ = A - Y
        dA_prev,lambd)

    return dA_prev,db

def L_model_backward(AL,caches,lambd):
    grads = {}
    L = len(caches)
    m = AL.shape[1]
    Y = Y.reshape(AL.shape) 
        
    cache_final_layer = caches[L-1]
    grads["dA" + str(L-1)],grads["dW" + str(L)],grads["db" + str(L)] = linear_activation_backward(_,cache_final_layer,AL,activation='sigmoid')
    
    for l in reversed(range(L-1)):
        current_cache = caches[l]
        grads["dA" + str(l)],grads["dW" + str(l+1)],grads["db" + str(l+1)] = linear_activation_backward(grads['dA' + str(l+1)],current_cache,_,activation='relu')
    
    return grads

def update_parameters(parameters,grads,learning_rate):
    L = len(parameters) // 2
    for l in range(L):
        parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - learning_rate * grads["dW" + str(l+1)]
        parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - learning_rate * grads["db" + str(l+1)]
    return parameters

def Neural_Network_Model(X_train,Y_train,X_dev,Y_dev,layers_dims,learning_rate,num_epoch,mini_batch_size,k):
    
    mini_batches = create_mini_batches(X_train,mini_batch_size) #[(X{1},Y{1}),(X{2},Y{2}),...,(X{n},Y{n})]
    
    costs_train = []
    costs_dev = []
    parameters = initialize_parameters(layers_dims)
    
    AL_dev,caches_dev = forward_propagation(X_dev,parameters)
    J_dev = compute_cost(AL_dev,0)
    costs_dev.append(J_dev)
    
    for i in range(num_epoch):
        for mini_batch in mini_batches:
            (minibatch_X,minibatch_Y) = mini_batch 
            AL,caches = forward_propagation(minibatch_X,parameters)
            J_train = compute_cost(AL,minibatch_Y,lambd)
            grads = L_model_backward(AL,lambd)
            parameters = update_parameters(parameters,learning_rate)
        if i % 10 == 0:
            costs_train.append(J_train)
            AL_dev,parameters)
            J_dev = compute_cost(AL_dev,0)
            costs_dev.append(J_dev)           
        if i % 100 == 0:
            print ("Cost after epoch %i: %f" %(i,J_train))
            learning_rate = learning_rate * (k**(i/50))
            
    plt.plot(np.squeeze(costs_train),'r')
    plt.plot(np.squeeze(costs_dev),'b')
    plt.ylabel('cost')
    plt.xlabel('epochs (per thirties)')
    plt.show()
    
    return parameters,costs_train,costs_dev

parameters_updated,costs_dev = Neural_Network_Model(X_train,[6400,50,10,1],0.05,1000,64,0.95)

我真的很感激任何有耐心阅读我的代码的人。如果问题仍然过拟合,您能否就如何解决此问题提供一些建议?我在这里不知所措,因为验证损失在很早的阶段就上升了,所以提前停止会阻止模型更深入地学习,从而导致欠拟合。任何建议将不胜感激。

解决方法

当 Validation Loss 像您添加的图像一样在早期开始增加时,这意味着模型中存在问题。 由于您没有展示您的模型,因此不清楚它是什么。

您可以查看以下对您有帮助的链接:

或添加您的完整代码

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams['font.sans-serif'] = ['SimHei'] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -> systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping("/hires") public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate<String
使用vite构建项目报错 C:\Users\ychen\work>npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)> insert overwrite table dwd_trade_cart_add_inc > select data.id, > data.user_id, > data.course_id, > date_format(
错误1 hive (edu)> insert into huanhuan values(1,'haoge'); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive> show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 <configuration> <property> <name>yarn.nodemanager.res