如何解决BertModel用于2类分类-如何评估效果
我正在构建用于分类的微调BERT模型(最后是线性层)。预测应该只是1/0(是,否)。
当我编写评估部分时,我看到一些在线用户为logit做了F.log_softmax,然后使用np.argmax来获得预测的标签。但是,我还看到人们直接在logits上应用np.argmax而不激活softmax。我想知道应该跟随哪一位,以及如何决定?
这是我的模型定义:
class ReviewClassification(BertPreTrainedModel):
def __init__(self,config):
super().__init__(config)
self.num_labels = 2
self.bert = BertModel(config)
self.dropout = nn.Dropout(config.hidden_dropout_prob)
embedding_size = config.hidden_size
self.classifier = nn.Linear(embedding_size,len(LABEL_NAME))
self.init_weights()
def forward(
self,review_input_ids=None,review_attention_mask=None,review_token_type_ids=None,agent_input_ids=None,agent_attention_mask=None,agent_token_type_ids=None,labels=None,):
review_outputs = self.bert(
review_input_ids,attention_mask=review_attention_mask,token_type_ids=review_token_type_ids,position_ids=None,head_mask=None,inputs_embeds=None,)
feature = review_outputs[1] # (batch_size,seq_len) -? Should it be (batch_size,hidden_size)
# nn.CrossEntropyLoss applies F.log_softmax and nn.NLLLoss internally on your input,# so you should pass the raw logits to it.
logits = self.classifier(feature)
outputs = (logits,) # + outputs[2:] # add hidden states and attention if they are here
if labels is not None:
loss_fct = nn.CrossEntropyLoss()
loss = loss_fct(logits.view(-1,self.num_labels),labels.view(-1))
outputs = (loss,) + outputs
return outputs # (loss,logits,hidden_states,attentions)
然后这是我的验证码
def model_validate(model,data_loader):
# Put the model in evaluation mode--the dropout layers behave differently
# during evaluation.
model.eval()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
if torch.cuda.device_count() > 1:
model = nn.DataParallel(model)
label_prop = data_loader.dataset.dataset.label_prop()
total_valid_loss = 0
batch_size = data_loader.batch_size
num_batch = len(data_loader)
y_pred,y_true = [],[]
# Evaluate data
for step,batch in tqdm(enumerate(data_loader),desc="Validation...",total=num_batch):
b_review_input_ids = batch["review_input_ids"].to(device)
b_review_attention_mask = batch["review_attention_mask"].to(device)
b_review_token_type_ids = batch["review_token_type_ids"].to(device)
b_binarized_label = batch["binarized_label"].to(device)
# Tell pytorch not to bother with constructing the compute graph during
# the forward pass,since this is only needed for backprop (training).
with torch.no_grad():
(loss,) = model(review_input_ids=b_review_input_ids,review_attention_mask=b_review_attention_mask,review_token_type_ids=b_review_token_type_ids,labels=b_binarized_label)
total_valid_loss += loss.item()
numpy_probas = logits.detach().cpu().numpy()
y_pred.extend(np.argmax(numpy_probas,axis=1).flatten())
y_true.extend(b_binarized_label.cpu().numpy())
# End of an epoch of validation
# put model to train mode again.
model.train()
ave_loss = total_valid_loss / (num_batch * batch_size)
# compute the varIoUs f1 score for each label
report = classification_report(y_true,y_pred,output_dict=True)
metrics_df = pd.DataFrame(report).transpose()
metrics_df = metrics_df.sort_index()
weighted_f1_score = metrics_df.loc['weighted avg','f1-score']
averaged_f1_score = metrics_df.loc['macro avg','f1-score']
return ave_loss,metrics_df,{
"weighted": weighted_f1_score,"averaged": averaged_f1_score,}
我正在尝试的另一个版本是:
transfored_logits = F.log_softmax(logits,dim=1)
numpy_probas = transfored_logits.detach().cpu().numpy()
y_pred.extend(np.argmax(numpy_probas,axis=1).flatten())
y_true.extend(b_binarized_label.cpu().numpy())
我尝试的第三个版本是:
transfored_logits = torch.sigmoid(logits)
numpy_probas = transfored_logits.detach().cpu().numpy()
y_pred.extend(np.argmax(numpy_probas,axis=1).flatten())
y_true.extend(b_binarized_label.cpu().numpy())
我也不知道如何理解结果。当我在网上看到时,人们说如果我为log_softmax设置dim = 1,则所有功能(类别)的概率之和应为1。但是,下面给出一些示例:
这是logits输出:(对于一个批次-批次大小= 16,num_labels = 2)
tensor([[ 1.1261,-1.8547],[ 0.6066,-1.1498],[ 1.3667,-2.0078],[ 2.0652,-2.6669],[ 1.0388,-1.7555],[ 0.6801,-1.1652],[ 0.8315,-1.3860],[ 1.5685,-2.2362],[ 0.1150,-0.3344],[ 2.0751,-2.6166],[ 1.5033,-2.1702],[ 0.1115,-0.3096],[ 0.8610,-1.4834],[ 1.5544,-2.2773],[ 2.1014,-2.6533],[ 0.7789,-1.3748]],device='cuda:0')
如果我先应用softmax,即F.log_softmax(logits,dim = 1),则会得到:
tensor([[-0.0495,-3.0302],[-0.1593,-1.9157],[-0.0337,-3.4082],[-0.0088,-4.7409],[-0.0594,-2.8537],[-0.1467,-1.9920],[-0.1033,-2.3209],[-0.0220,-3.8267],[-0.4935,-0.9429],[-0.0091,-4.7008],[-0.0251,-3.6985],[-0.5046,-0.9257],[-0.0916,-2.4360],[-0.0214,-3.8531],[-0.0086,-4.7632],[-0.1098,-2.2635]],device='cuda:0')
每行的总和不等于1,对我来说似乎也不是概率。
如果我使用sigmoid,则为torch.sigmoid(logits)
tensor([[0.7551,0.1353],[0.6472,0.2405],[0.7969,0.1184],[0.8875,0.0650],[0.7386,0.1474],[0.6638,0.2377],[0.6967,0.2000],[0.8276,0.0965],[0.5287,0.4172],[0.8885,0.0681],[0.8181,0.1025],[0.5278,0.4232],[0.7029,0.1849],[0.8255,0.0930],[0.8910,0.0658],[0.6854,0.2018]],device='cuda:0')
它看起来确实更像概率,尽管它仍不等于1。
无论我使用哪个版本,这种情况下的预测结果始终是相同的(因为我的1(是)标签的发生率非常低)
array([0,0])
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。