如何解决训练损失不减少将 pytorch 代码转换为 tensorflow
我正在使用transformers用roberta训练一个模型,经过几步训练损失并没有减少,我找不到原因,任何建议将不胜感激。
这是模型:
class TriggerExtractor(keras.Model):
def __init__(self,bert_dir,dropout_prob=0.1,use_distant_trigger=True,**kwargs):
super(TriggerExtractor,self).__init__()
config_path = os.path.join(bert_dir,'config.json')
assert os.path.exists(bert_dir) and os.path.exists(config_path),'pretrained bert file does not exist'
self.bert_module = TFBertModel.from_pretrained(bert_dir)
self.bert_config = self.bert_module.config
self.use_distant_trigger = use_distant_trigger
out_dims = self.bert_config.hidden_size
if use_distant_trigger:
embedding_dim = kwargs.pop('embedding_dims',256)
self.distant_trigger_embedding = tf.keras.layers.Embedding(input_dim=3,output_dim=embedding_dim,embeddings_initializer=keras.initializers.Henormal)
out_dims += embedding_dim
mid_linear_dims = kwargs.pop('mid_linear_dims',128)
self.mid_linear = keras.Sequential([
keras.layers.Dense(mid_linear_dims,input_shape=(out_dims,),activation=None),keras.layers.ReLU(),keras.layers.Dropout(dropout_prob)
])
self.classifier = keras.layers.Dense(2,input_shape=(mid_linear_dims,activation=None)
self.criterion = keras.losses.BinaryCrossentropy()
def call(self,inputs):
# print('inputs:',inputs)
token_ids = inputs['token_ids']
attention_masks = inputs['attention_masks']
token_type_ids = inputs['token_type_ids']
distant_trigger = inputs['distant_trigger']
labels = inputs['labels']
bert_outputs = self.bert_module(
input_ids=token_ids,attention_mask=attention_masks,token_type_ids=token_type_ids
)
seq_out = bert_outputs[0]
if self.use_distant_trigger:
assert distant_trigger is not None,\
'When using distant trigger features,distant trigger should be implemented'
distant_trigger_feature = self.distant_trigger_embedding(distant_trigger)
seq_out = keras.layers.concatenate([seq_out,distant_trigger_feature],axis=-1)
seq_out = self.mid_linear(seq_out)
logits = keras.activations.sigmoid(self.classifier(seq_out))
out = (logits,)
if labels is not None:
labels = tf.cast(labels,dtype=tf.float32)
loss = self.criterion(logits,labels)
out = (loss,) + out
return out
这是火车代码:
train_loader = tf.data.Dataset.from_tensor_slices(train_dataset.__dict__).shuffle(10000).batch(opt.train_batch_size)
for epoch in range(opt.train_epochs):
for step,batch_data in enumerate(train_loader):
with tf.GradientTape() as tape:
loss = model(batch_data)
grads = tape.gradient(loss,model.variables)
# for (grad,var) in zip(grads,model.variables):
# if grad is not None:
# name = var.name
# space = name.split('/')
# if space[0] == 'tf_bert_model':
# optimizer_bert.apply_gradients([(tf.clip_by_norm(grad,opt.max_grad_norm),var)])
# else:
# optimizer_other.apply_gradients([(tf.clip_by_norm(grad,var)])
optimizer.apply_gradients([
(tf.clip_by_norm(grad,var)
for (grad,model.variables)
if grad is not None
])
global_step += 1
if global_step % log_loss_steps == 0:
avg_loss /= log_loss_steps
logger.info('epoch:%d Step: %d / %d ----> total loss: %.5f' % (epoch,global_step,t_total,avg_loss))
avg_loss = 0.
else:
avg_loss += loss[0]
结果如下:
07/13/2021 20:09:22 - INFO - src_final.utils.trainer - ***** Running training *****
07/13/2021 20:09:22 - INFO - src_final.utils.trainer - Num Epochs = 10
07/13/2021 20:09:22 - INFO - src_final.utils.trainer - Total training batch size = 8
07/13/2021 20:09:22 - INFO - src_final.utils.trainer - Total optimization steps = 3070
07/13/2021 20:09:22 - INFO - src_final.utils.trainer - Save model in 307 steps; Eval model in 307 steps
07/13/2021 20:09:36 - INFO - src_final.utils.trainer - epoch:0 Step: 20 / 3070 ----> total loss: 1.73774
07/13/2021 20:09:50 - INFO - src_final.utils.trainer - epoch:0 Step: 40 / 3070 ----> total loss: 0.04631
07/13/2021 20:10:03 - INFO - src_final.utils.trainer - epoch:0 Step: 60 / 3070 ----> total loss: 0.04586
07/13/2021 20:10:17 - INFO - src_final.utils.trainer - epoch:0 Step: 80 / 3070 ----> total loss: 0.04734
07/13/2021 20:10:31 - INFO - src_final.utils.trainer - epoch:0 Step: 100 / 3070 ----> total loss: 0.04554
07/13/2021 20:10:44 - INFO - src_final.utils.trainer - epoch:0 Step: 120 / 3070 ----> total loss: 0.04733
07/13/2021 20:10:58 - INFO - src_final.utils.trainer - epoch:0 Step: 140 / 3070 ----> total loss: 0.04613
07/13/2021 20:11:12 - INFO - src_final.utils.trainer - epoch:0 Step: 160 / 3070 ----> total loss: 0.04643
07/13/2021 20:11:26 - INFO - src_final.utils.trainer - epoch:0 Step: 180 / 3070 ----> total loss: 0.04613
07/13/2021 20:11:39 - INFO - src_final.utils.trainer - epoch:0 Step: 200 / 3070 ----> total loss: 0.04643
07/13/2021 20:11:53 - INFO - src_final.utils.trainer - epoch:0 Step: 220 / 3070 ----> total loss: 0.04553
07/13/2021 20:12:07 - INFO - src_final.utils.trainer - epoch:0 Step: 240 / 3070 ----> total loss: 0.04582
07/13/2021 20:12:21 - INFO - src_final.utils.trainer - epoch:0 Step: 260 / 3070 ----> total loss: 0.04642
07/13/2021 20:12:35 - INFO - src_final.utils.trainer - epoch:0 Step: 280 / 3070 ----> total loss: 0.04582
07/13/2021 20:12:48 - INFO - src_final.utils.trainer - epoch:0 Step: 300 / 3070 ----> total loss: 0.04672
07/13/2021 20:12:53 - INFO - src_final.utils.trainer - Saving model & optimizer & scheduler checkpoint to ./out/final/trigger/roberta_wwm_distant_trigger_pgd/checkpoint-307
07/13/2021 20:13:05 - INFO - src_final.utils.trainer - epoch:1 Step: 320 / 3070 ----> total loss: 0.04582
07/13/2021 20:13:18 - INFO - src_final.utils.trainer - epoch:1 Step: 340 / 3070 ----> total loss: 0.04552
07/13/2021 20:13:32 - INFO - src_final.utils.trainer - epoch:1 Step: 360 / 3070 ----> total loss: 0.04672
07/13/2021 20:13:46 - INFO - src_final.utils.trainer - epoch:1 Step: 380 / 3070 ----> total loss: 0.04762
07/13/2021 20:13:59 - INFO - src_final.utils.trainer - epoch:1 Step: 400 / 3070 ----> total loss: 0.04642
07/13/2021 20:14:13 - INFO - src_final.utils.trainer - epoch:1 Step: 420 / 3070 ----> total loss: 0.04612
07/13/2021 20:14:27 - INFO - src_final.utils.trainer - epoch:1 Step: 440 / 3070 ----> total loss: 0.04582
07/13/2021 20:14:41 - INFO - src_final.utils.trainer - epoch:1 Step: 460 / 3070 ----> total loss: 0.04702
07/13/2021 20:14:54 - INFO - src_final.utils.trainer - epoch:1 Step: 480 / 3070 ----> total loss: 0.04672
07/13/2021 20:15:08 - INFO - src_final.utils.trainer - epoch:1 Step: 500 / 3070 ----> total loss: 0.04672
07/13/2021 20:15:22 - INFO - src_final.utils.trainer - epoch:1 Step: 520 / 3070 ----> total loss: 0.04552
07/13/2021 20:15:36 - INFO - src_final.utils.trainer - epoch:1 Step: 540 / 3070 ----> total loss: 0.04552
07/13/2021 20:15:49 - INFO - src_final.utils.trainer - epoch:1 Step: 560 / 3070 ----> total loss: 0.04672
07/13/2021 20:16:03 - INFO - src_final.utils.trainer - epoch:1 Step: 580 / 3070 ----> total loss: 0.04552
PS:由于某种原因,我正在将pytorch代码转换为tensorflow,pytorch版本没问题(https://github.com/WuHuRestaurant/xf_event_extraction2020Top1)。再次感谢
解决方法
我找到了原因。它是 'y_pred' 和 'y_true' 之间的位置。 pytorch 和 tensorflow 有区别
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。