如何解决在Huggingface BERT模型顶部添加密集层
我想在输出原始隐藏状态的裸BERT模型转换器之上添加一个密集层,然后微调生成的模型。具体来说,我正在使用this基本模型。这就是模型应该做的:
到目前为止,我已经成功地对句子进行了编码:
from sklearn.neural_network import MLPRegressor
import torch
from transformers import AutoModel,AutoTokenizer
# List of strings
sentences = [...]
# List of numbers
labels = [...]
tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
model = AutoModel.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
# 2D array,one line per sentence containing the embedding of the first token
encoded_sentences = torch.stack([model(**tokenizer(s,return_tensors='pt'))[0][0][0]
for s in sentences]).detach().numpy()
regr = MLPRegressor()
regr.fit(encoded_sentences,labels)
通过这种方式,我可以通过向神经网络提供编码后的句子来训练它。但是,这种方法显然不能对基本BERT模型进行微调。有谁能够帮助我?如何建立可以完全微调的模型(可能是在pytorch或使用Huggingface库)?
解决方法
有两种方法可以执行此操作:由于您希望为类似于分类的下游任务微调模型,因此可以直接使用:
BertForSequenceClassification
类。在768的输出维度上对逻辑回归层进行微调。
或者,您可以定义一个自定义模块,该模块基于预训练的权重创建bert模型,并在其之上添加层。
from transformers import BertModel
class CustomBERTModel(nn.Module):
def __init__(self):
super(CustomBERTModel,self).__init__()
self.bert = BertModel.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
### New layers:
self.linear1 = nn.Linear(768,256)
self.linear2 = nn.Linear(256,3) ## 3 is the number of classes in this example
def forward(self,ids,mask):
sequence_output,pooled_output = self.bert(
ids,attention_mask=mask)
# sequence_output has the following shape: (batch_size,sequence_length,768)
linear1_output = self.linear1(sequence_output[:,:].view(-1,768)) ## extract the 1st token's embeddings
linear2_output = self.linear2(linear2_output)
return linear2_output
tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
model = CustomBERTModel() # You can pass the parameters if required to have more flexible model
model.to(torch.device("cpu")) ## can be gpu
criterion = nn.CrossEntropyLoss() ## If required define your own criterion
optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad,model.parameters()))
for epoch in epochs:
for batch in data_loader: ## If you have a DataLoader() object to get the data.
data = batch[0]
targets = batch[1] ## assuming that data loader returns a tuple of data and its targets
optimizer.zero_grad()
encoding = tokenizer.batch_encode_plus(data,return_tensors='pt',padding=True,truncation=True,max_length=50,add_special_tokens = True)
outputs = model(input_ids,attention_mask=attention_mask)
outputs = F.log_softmax(outputs,dim=1)
input_ids = encoding['input_ids']
attention_mask = encoding['attention_mask']
loss = criterion(outputs,targets)
loss.backward()
optimizer.step()
,
如果要调整BERT模型本身,则需要修改模型的参数。为此,您很可能希望使用PyTorch。这是一些粗略的伪代码来说明:
from torch.optim import SGD
model = ... # whatever model you are using
parameters = model.parameters() # or some more specific set of parameters
optimizer = SGD(parameters,lr=.01) # or whatever optimizer you want
optimizer.zero_grad() # boiler-platy pytorch function
input = ... # whatever the appropriate input for your task is
label = ... # whatever the appropriate label for your task is
loss = model(**input,label) # usuall loss is the first item returned
loss.backward() # calculates gradient
optim.step() # runs optimization algorithm
我遗漏了所有相关细节,因为它们非常繁琐且特定于您的特定任务。 Huggingface有一篇不错的文章,详细介绍了here,在使用任何pytorch内容时,您一定会参考一些pytorch文档。我强烈建议您pytorch blitz,然后再尝试对其进行任何认真的处理。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。