为什么不使用torch.cuda.empty_cache释放CUDA内存

如何解决为什么不使用torch.cuda.empty_cache释放CUDA内存

在Windows 10上，如果直接创建GPU张量，则可以成功释放其内存。

import torch
a = torch.zeros(300000000,dtype=torch.int8,device='cuda')
del a
torch.cuda.empty_cache()

但是，如果我创建一个普通的张量并将其转换为GPU张量，我将无法再释放其内存。

import torch
a = torch.zeros(300000000,dtype=torch.int8)
a.cuda()
del a
torch.cuda.empty_cache()

为什么会这样。

解决方法

至少在Ubuntu中，脚本在交互式外壳程序中运行时不会释放内存，并且在作为脚本运行时会按预期工作。我认为就地通话中存在一些参考问题。以下内容既可以在交互式外壳程序中运行，也可以作为脚本运行。

import torch
a = torch.zeros(300000000,dtype=torch.int8)
a = a.cuda()
del a
torch.cuda.empty_cache()

是的，这也发生在我的电脑上，配置如下：

20.04.1-Ubuntu
1.7.1+cu110

根据来自 fastai 讨论的信息：https://forums.fast.ai/t/gpu-memory-not-being-freed-after-training-is-over/10265/8

这个和ipython环境下的python垃圾收集器有关。

def pretty_size(size):
    """Pretty prints a torch.Size object"""
    assert(isinstance(size,torch.Size))
    return " × ".join(map(str,size))

def dump_tensors(gpu_only=True):
    """Prints a list of the Tensors being tracked by the garbage collector."""
    import gc
    total_size = 0
    for obj in gc.get_objects():
        try:
            if torch.is_tensor(obj):
                if not gpu_only or obj.is_cuda:
                    print("%s:%s%s %s" % (type(obj).__name__," GPU" if obj.is_cuda else ""," pinned" if obj.is_pinned else "",pretty_size(obj.size())))
                    total_size += obj.numel()
            elif hasattr(obj,"data") and torch.is_tensor(obj.data):
                if not gpu_only or obj.is_cuda:
                    print("%s → %s:%s%s%s%s %s" % (type(obj).__name__,type(obj.data).__name__," pinned" if obj.data.is_pinned else ""," grad" if obj.requires_grad else ""," volatile" if obj.volatile else "",pretty_size(obj.data.size())))
                    total_size += obj.data.numel()
        except Exception as e:
            pass        
    print("Total size:",total_size)

如果我做类似的事情

import torch as th
a = th.randn(10,1000,1000)
aa = a.cuda()
del aa
th.cuda.empty_cache()

您不会看到 nvidia-smi/nvtop 有任何减少。但是您可以使用方便的功能了解正在发生的事情

dump_tensors()

您可能会观察到以下信息：

Tensor: GPU pinned 10 × 1000 × 1000
Total size: 10000000

这意味着你的 gc 仍然持有资源。

关于python gc机制的更多讨论可以参考。

Force garbage collection in Python to free memory

为什么不使用torch.cuda.empty_cache释放CUDA内存

如何解决为什么不使用torch.cuda.empty_cache释放CUDA内存

解决方法

相关推荐