如何解决pytorch模型变压器库针对GPU的NLP任务的有效预测
我拥有大量的推文文本数据集(近30亿条推文)。我对带有注释数据集的BERT模型进行了分类。
有没有一种方法可以使预测更有效,更快?我只能访问一个GPU。
此刻,我使用以下代码:
import pandas as pd
import sys
from transformers import pipeline
judge = pipeline(
task="sentiment-analysis",model="/trained_disasterlabels",tokenizer='bert-base-uncased',device=0)
id_disaster=sys.argv[1]
path="/disaster_data/ids_"+id_disaster+"_text"
open(path+"_labelpred",'w').close()
n=200
with open(path,"rb") as f:
batch=[]
for line in f:
batch.append(line.decode().rstrip("\n"))
if len(batch)==n:
preds=judge(batch)
preds_labels=[x['label'].replace("LABEL_","") for x in preds]
preds_probs=[round(x['score'],4) for x in preds]
valsdf=pd.DataFrame({"labels":preds_labels,"probs":preds_probs})
valsdf.to_csv(path+"_labelpred",mode='a',header=False,index=False)
batch=[]
preds=judge(batch)
preds_labels=[x['label'].replace("LABEL_","") for x in preds]
preds_probs=[round(x['score'],4) for x in preds]
valsdf=pd.DataFrame({"labels":preds_labels,"probs":preds_probs})
valsdf.to_csv(path+"_labelpred",index=False)
在我还尝试以下方法之前:
model=BertForSequenceClassification.from_pretrained("/trained_disasterlabels")
model.eval()
model.to('cuda')
tokenizer=BertTokenizer.from_pretrained('bert-base-uncased')
import torch.nn.functional as F
path="/disaster_data/ids_"+id_disaster+"_text"
open(path+"_labelpred",'w').close()
n=100
with open(path,"rb") as f:
batch=[]
for line in f:
batch.append(line.decode().rstrip("\n"))
if len(batch)==n:
input_ids = tokenizer.batch_encode_plus(batch,add_special_tokens=True,return_tensors = 'pt',padding=True,truncation=True)
input_ids.to('cuda')
with torch.no_grad():
last_hidden_states = model(**input_ids)
temp_cpu = F.softmax(last_hidden_states[0],dim=1).detach().cpu().numpy()
valsdf=pd.DataFrame(temp_cpu.tolist(),columns=["prob","labelpred"])
batch=[]
input_ids = tokenizer.batch_encode_plus(batch,truncation=True)
with torch.no_grad():
last_hidden_states = model(**input_ids)
temp_cpu = F.softmax(last_hidden_states[0],dim=1).detach().cpu().numpy()
valsdf=pd.DataFrame(temp_cpu.tolist(),columns=["labelpred","prob"])
valsdf.to_csv(path+"_labelpred",index=False)
batch=[]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。