如何解决准确度通过 Epochs 提高,但返回到评估的初始准确度
我正在开展一个项目,以在 EEG 数据集上使用神经网络重建研究结果。在我从事该项目的整个过程中,我一直遇到重复的问题,即模型在整个 epoch 中都有一些精度改进,但对于评估,精度总是返回到初始值,特别是 1/NUM_CLASSES。其中 NUM_CLASSES 是分类类别的数量。老实说,我在这一点上被困住了,我认为模型过度拟合并试图调整我的数据预处理以进行补偿,但运气不佳。
代码如下:
# Filters out warnings
import warnings
warnings.filterwarnings("ignore")
# Imports,as of 3/10,all are necessary
import numpy as np
import tensorflow as tf
from keras import layers
from keras import backend as K
from keras.models import Model
from keras.optimizers import Adam,SGD
from keras.callbacks import Callback
from keras.layers import Conv3D,Input,Dense,Activation,BatchNormalization,Flatten,Add,Softmax
from sklearn.model_selection import StratifiedKFold
from DonghyunMBCNN import MultiBranchCNN
# Global Variables
# The directory of the process data,must have been converted and cropped,reference dataProcessing.py and crop.py
DATA_DIR = "../datasets/BCICIV_2a_cropped/"
# Which trial subject will be trained
SUBJECT = 1
# The number of classification categories,for motor imagery,there are 4
NUM_CLASSES = 4
# The number of timesteps in each input array
TIMESTEPS = 240
# The X-Dimension of the dataset
XDIM = 7
# The Y-Dimension of the dataset
YDIM = 6
# The delta loss requirement for lower training rate
LOSS_THRESHOLD = 0.01
# Initial learning rate for ADAM optimizer
INIT_LR = 0.01
# Define Which NLL (Negative Log Likelihood) Loss function to use,either "NLL1","NLL2",or "SCCE"
LOSS_FUNCTION = 'NLL2'
# Defines which optimizer is in use,either "ADAM" or "SGD"
OPTIMIZER = 'SGD'
# Whether training output should be given
VERBOSE = 1
# Determines whether K-Fold Cross Validation is used
USE_KFOLD = False
# Number of ksplit validation,must be atleast 2
KFOLD_NUM = 2
# Specifies which model structure will be used,'1' corresponds to the Create_Model function and '2' corresponds to Donghyun's model.
USE_STRUCTURE = '2'
# Number of epochs to train for
EPOCHS = 10
# Receptive field sizes
SRF_SIZE = (2,2,1)
MRF_SIZE = (2,3)
LRF_SIZE = (2,5)
# Strides for each receptive field
SRF_STRIDES = (2,1)
MRF_STRIDES = (2,2)
LRF_STRIDES = (2,4)
# This is meant to handle the reduction of the learning rate,current is not accurate,I have been unable to access the loss information from each Epoch
# The expectation is that if the delta loss is < threshold,learning rate *= 0.1. Threshold has not been set yet.
class LearningRateReducerCb(Callback):
def __init__(self):
self.history = {}
def on_epoch_end(self,epoch,logs={}):
for k,v in logs.items():
self.history.setdefault(k,[]).append(v)
fin_index = len(self.history['loss']) - 1
if (fin_index >= 1):
if (self.history['loss'][fin_index-1] - self.history['loss'][fin_index] > LOSS_THRESHOLD):
old_lr = self.model.optimizer.lr.read_value()
new_lr = old_lr*0.1
print("\nEpoch: {}. Reducing Learning Rate from {} to {}".format(epoch,old_lr,new_lr))
self.model.optimizer.lr.assign(new_lr)
# The Negative Log Likelihood function
def Loss_FN1(y_true,y_pred,sample_weight=None):
return K.sum(K.binary_crossentropy(y_true,y_pred),axis=-1) # This is another loss function that I tried,was less effective
# Second NLL function,generally seems to work better
def Loss_FN2(y_true,sample_weight=None):
n_dims = int(int(y_pred.shape[1])/2)
mu = y_pred[:,0:n_dims]
logsigma = y_pred[:,n_dims:]
mse = -0.5*K.sum(K.square((y_true-mu)/K.exp(logsigma)),axis=1)
sigma_trace = -K.sum(logsigma,axis=1)
log2pi = -0.5*n_dims*np.log(2*np.pi)
log_likelihood = mse+sigma_trace+log2pi
return K.mean(-log_likelihood)
# Loads given data into two arrays,x and y,while also ensuring that all values are formatted as float32s
def load_data(data_dir,num):
x = np.load(data_dir + "A0" + str(num) + "TD_cropped.npy").astype(np.float32)
y = np.load(data_dir + "A0" + str(num) + "TK_cropped.npy").astype(np.float32)
return x,y
def create_receptive_field(size,strides,model,name):
modelRF = Conv3D(kernel_size = size,strides=strides,filters=32,padding='same',name=name+'1')(model)
modelRF1 = BatchNormalization()(modelRF)
modelRF2 = Activation('elu')(modelRF1)
modelRF3 = Conv3D(kernel_size = size,filters=64,name=name+'2')(modelRF2)
modelRF4 = BatchNormalization()(modelRF3)
modelRF5 = Activation('elu')(modelRF4)
modelRF6 = Flatten()(modelRF5)
modelRF7 = Dense(32)(modelRF6)
modelRF8 = BatchNormalization()(modelRF7)
modelRF9 = Activation('relu')(modelRF8)
modelRF10 = Dense(32)(modelRF9)
modelRF11 = BatchNormalization()(modelRF10)
modelRF12 = Activation('relu')(modelRF11)
return Dense(NUM_CLASSES,activation='softmax')(modelRF12)
def Create_Model():
# Model Creation
model1 = Input(shape=(1,XDIM,YDIM,TIMESTEPS))
# 1st Convolution Layer
model1a = Conv3D(kernel_size = (3,3,5),strides = (2,4),filters=16,name="Conv1")(model1)
model1b = BatchNormalization()(model1a)
model1c = Activation('elu')(model1b)
# Small Receptive Field (SRF)
modelSRF = create_receptive_field(SRF_SIZE,SRF_STRIDES,model1c,'SRF')
# Medium Receptive Field (MRF)
modelMRF = create_receptive_field(MRF_SIZE,MRF_STRIDES,'MRF')
# Large Receptive Field (LRF)
modelLRF = create_receptive_field(LRF_SIZE,LRF_STRIDES,'LRF')
# Add the layers - This sums each layer
final = Add()([modelSRF,modelMRF,modelLRF])
out = Softmax()(final)
model = Model(inputs=model1,outputs=out)
return model
if (LOSS_FUNCTION == 'NLL1'):
loss_function = Loss_FN1
elif (LOSS_FUNCTION == 'NLL2'):
loss_function = Loss_FN2
elif (LOSS_FUNCTION == 'SCCE'):
loss_function = 'sparse_categorical_crossentropy'
# Optimizer is given as ADAM with an initial learning rate of 0.01
if (OPTIMIZER == 'ADAM'):
opt = Adam(learning_rate = INIT_LR)
elif (OPTIMIZER == 'SGD'):
opt = SGD(learning_rate = INIT_LR)
X,Y = load_data(DATA_DIR,SUBJECT)
if (USE_KFOLD):
seed = 4
kfold = StratifiedKFold(n_splits=KFOLD_NUM,shuffle=True,random_state=seed)
cvscores = []
for train,test in kfold.split(X,Y):
if (USE_STRUCTURE == '1'):
MRF_model = Create_Model()
elif (USE_STRUCTURE == '2'):
MRF_model = MultiBranchCNN(TIMESTEPS,NUM_CLASSES)
# Compiling the model with the negative log likelihood loss function,ADAM optimizer
MRF_model.compile(loss=loss_function,optimizer=opt,metrics=['accuracy'])
# Training for 30 epochs
MRF_model.fit(X[train],Y[train],epochs=30,verbose=VERBOSE)
# Evaluating the effectiveness of the model
scores = MRF_model.evaluate(X[test],Y[test],verbose=VERBOSE)
print("%s: %.2f%%" % (MRF_model.metrics_names[1],scores[1]*100))
cvscores.append(scores[1]*100)
print("%.2f%% (+/- %.2f%%)" % (np.mean(cvscores),np.std(cvscores)))
else:
if (USE_STRUCTURE == '1'):
MRF_model = Create_Model()
elif (USE_STRUCTURE == '2'):
MRF_model = MultiBranchCNN(TIMESTEPS,NUM_CLASSES)
MRF_model.compile(loss=loss_function,metrics=['accuracy'])
MRF_model.fit(X,Y,epochs=EPOCHS,verbose=VERBOSE)
_,acc = MRF_model.evaluate(X,verbose=VERBOSE)
print("Accuracy: %.2f" % (acc*100))
数据来自 BCICIV 2A 数据集,包含 25 个通道。 3 个 EOG 通道被忽略,剩下 22 个通道。这 22 个通道被格式化为一个 7x6 - 0 填充阵列,以提供更具空间相关性的表示。我们使用滑动窗口方法来补偿一个小数据集,然后还在每次试验中运行通道平均 进一步处理数据。训练结果如下。
Epoch 1/10
666/666 [==============================] - 13s 17ms/step - loss: 4.0290 - accuracy: 0.3236
Epoch 2/10
666/666 [==============================] - 12s 18ms/step - loss: 3.9622 - accuracy: 0.3434
Epoch 3/10
666/666 [==============================] - 14s 21ms/step - loss: 3.9747 - accuracy: 0.3481
Epoch 4/10
666/666 [==============================] - 14s 21ms/step - loss: 3.9373 - accuracy: 0.3720
Epoch 5/10
666/666 [==============================] - 14s 21ms/step - loss: 3.9412 - accuracy: 0.3710
Epoch 6/10
666/666 [==============================] - 14s 21ms/step - loss: 3.9191 - accuracy: 0.3829
Epoch 7/10
666/666 [==============================] - 14s 21ms/step - loss: 3.9234 - accuracy: 0.3936
Epoch 8/10
666/666 [==============================] - 14s 21ms/step - loss: 3.8973 - accuracy: 0.3983
Epoch 9/10
666/666 [==============================] - 14s 21ms/step - loss: 3.8780 - accuracy: 0.4022
Epoch 10/10
666/666 [==============================] - 14s 21ms/step - loss: 3.8647 - accuracy: 0.3900
666/666 [==============================] - 5s 8ms/step - loss: 4.1935 - accuracy: 0.2500
Accuracy: 25.00
不考虑准确率低,训练后准确率下降到 25.00 的事实令人担忧。我有一半觉得我遗漏了一些简单的东西,但一直无法解决问题。
欢迎任何建议或问题,非常感谢!
解决方法
我可以想到导致您观察到的差异的两个潜在原因,但我现在没有时间测试它们:
- 您正在使用的优化器
SGD
和Adam
都使用行的子集进行训练,而不是整个数据集。这会导致您观察到的不一致。 - BatchNorm 在训练时间和评估时间的工作方式不同。
两种情况都指向同一个方向:训练期间对准确率和损失的评估是对每批结果的汇总估计,在这种情况下过于乐观。
对于测试 1,您可以尝试将 batch_size
中的 fit
设置为 len(X)
。请注意,您可能会耗尽内存,而且速度肯定会很慢(可能会非常慢)。
对于测试 2,您可以尝试删除 BatchNorm 步骤。
如果您按照这些思路工作,请通知我!
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。