微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

培训和验证准确性提高,培训损失减少-验证损失为NaN

如何解决培训和验证准确性提高,培训损失减少-验证损失为NaN

我正在针对cats vs dogs数据训练分类器模型。该模型是resnet18的次要变体,并返回类别的softmax概率。但是,我注意到验证损失主要是NaN ,而训练损失却在稳步下降并表现出预期的效果。培训和验证准确性逐个时代提高。

Epoch 1/15
312/312 [==============================] - 1372s 4s/step - loss: 0.7849 - accuracy: 0.5131 - val_loss: nan - val_accuracy: 0.5343
Epoch 2/15
312/312 [==============================] - 1372s 4s/step - loss: 0.6966 - accuracy: 0.5539 - val_loss: 13989871201999266517090304.0000 - val_accuracy: 0.5619
Epoch 3/15
312/312 [==============================] - 1373s 4s/step - loss: 0.6570 - accuracy: 0.6077 - val_loss: 747123703808.0000 - val_accuracy: 0.5679
Epoch 4/15
312/312 [==============================] - 1372s 4s/step - loss: 0.6180 - accuracy: 0.6483 - val_loss: nan - val_accuracy: 0.6747
Epoch 5/15
312/312 [==============================] - 1373s 4s/step - loss: 0.5838 - accuracy: 0.6852 - val_loss: nan - val_accuracy: 0.6240
Epoch 6/15
312/312 [==============================] - 1372s 4s/step - loss: 0.5338 - accuracy: 0.7301 - val_loss: 31236203781405710523301888.0000 - val_accuracy: 0.7590
Epoch 7/15
312/312 [==============================] - 1373s 4s/step - loss: 0.4872 - accuracy: 0.7646 - val_loss: 52170.8672 - val_accuracy: 0.7378
Epoch 8/15
312/312 [==============================] - 1372s 4s/step - loss: 0.4385 - accuracy: 0.7928 - val_loss: 2130819335420217655296.0000 - val_accuracy: 0.8101
Epoch 9/15
312/312 [==============================] - 1373s 4s/step - loss: 0.3966 - accuracy: 0.8206 - val_loss: 116842888.0000 - val_accuracy: 0.7857
Epoch 10/15
312/312 [==============================] - 1372s 4s/step - loss: 0.3643 - accuracy: 0.8391 - val_loss: nan - val_accuracy: 0.8199
Epoch 11/15
312/312 [==============================] - 1373s 4s/step - loss: 0.3285 - accuracy: 0.8557 - val_loss: 788904.2500 - val_accuracy: 0.8438
Epoch 12/15
312/312 [==============================] - 1372s 4s/step - loss: 0.3029 - accuracy: 0.8670 - val_loss: nan - val_accuracy: 0.8245
Epoch 13/15
312/312 [==============================] - 1373s 4s/step - loss: 0.2857 - accuracy: 0.8781 - val_loss: 121907.8594 - val_accuracy: 0.8444
Epoch 14/15
312/312 [==============================] - 1373s 4s/step - loss: 0.2585 - accuracy: 0.8891 - val_loss: nan - val_accuracy: 0.8674
Epoch 15/15
312/312 [==============================] - 1374s 4s/step - loss: 0.2430 - accuracy: 0.8965 - val_loss: 822.7968 - val_accuracy: 0.8776

我检查了以下内容-

  • 验证数据中的Infinity / NaN
  • 规范化数据(使用tf.keras.applications.resnet.preprocess_input时引起的Infinity / NaN
  • 如果模型仅预测一个类,从而导致损失函数表现异常

培训代码以供参考-

optimizer = tf.keras.optimizers.Adam(learning_rate=5e-3)
model = resnet18(NUM_CLASSES=NUM_CLASSES) # variant of original model
model.compile(optimizer=optimizer,loss="categorical_crossentropy",metrics=["accuracy"])

history = model.fit(
    train_dataset,steps_per_epoch=len(X_train) // BATCH_SIZE,epochs=EPOCHS,validation_data=valid_dataset,validation_steps=len(X_valid) // BATCH_SIZE,verbose=1,)

我找到的最相关的答案是已接受的答案here的最后一段。但是,这里的情况似乎并非如此,因为与训练损失和回报率相比,验证损失的数量级有所不同。似乎损失函数行为不正常。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。