如何解决wandb - 运行时错误:CUDA 内存不足
我正在尝试运行简单转换器文档中超参数优化 example 中给出的模型,但是在经过一定次数的迭代后搜索超参数时,发生了 CUDA 内存不足错误。同样在搜索超参数期间,GPU 内存分配不断增加。
这是内存分配图:
这是我的代码。我在 Google Collab 上发布的代码。我该如何解决这个错误?
import logging
import pandas as pd
import sklearn
import wandb
from simpletransformers.classification import (
ClassificationArgs,ClassificationModel,)
sweep_config = {
"method": "bayes",# grid,random
"metric": {"name": "train_loss","goal": "minimize"},"parameters": {
"num_train_epochs": {"values": [2,3,5]},"learning_rate": {"min": 5e-5,"max": 4e-4},},}
sweep_id = wandb.sweep(sweep_config,project="Simple Sweep")
logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)
# Preparing train data
train_data = [
["Aragorn was the heir of Isildur","true"],["Frodo was the heir of Isildur","false"],]
train_df = pd.DataFrame(train_data)
train_df.columns = ["text","labels"]
# Preparing eval data
eval_data = [
["Theoden was the king of Rohan",["Merry was the king of Rohan",]
eval_df = pd.DataFrame(eval_data)
eval_df.columns = ["text","labels"]
model_args = ClassificationArgs()
model_args.reprocess_input_data = True
model_args.overwrite_output_dir = True
model_args.evaluate_during_training = True
model_args.manual_seed = 4
model_args.use_multiprocessing = True
model_args.train_batch_size = 16
model_args.eval_batch_size = 8
model_args.labels_list = ["true","false"]
model_args.wandb_project = "Simple Sweep"
def train():
# Initialize a new wandb run
wandb.init()
# Create a TransformerModel
model = ClassificationModel(
"roberta","roberta-base",use_cuda=True,args=model_args,sweep_config=wandb.config,)
# Train the model
model.train_model(train_df,eval_df=eval_df)
# Evaluate the model
model.eval_model(eval_df)
# Sync wandb
wandb.join()
wandb.agent(sweep_id,train)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。