Google Colab——MLP 执行时间不是恒定的，而是在减少？

如何解决Google Colab——MLP 执行时间不是恒定的，而是在减少？

我目前与 Google Colab 进行了大量合作，并希望为小型 MLP 的执行时间安排时间以进行推理。但是，在 Google Colab 中执行以下代码时，报告的运行时间随着笔记本的执行次数而减少（即，在笔记本终止后反复按下播放按钮）。

我正在获取价值

1st execution: 0.03   s 
2nd execution: 0.005  s 
3rd execution: 0.0007 s

在不同的机器和不同的浏览器上进行了测试。请注意，我知道 time.time() 在 Unix 系统上的精度限制为 1ms，但是，这并不能解释这种行为。

GPU/PyTorch 中是否存在某种缓存？如果是这样，为什么，我能否期待最终部署的应用程序的速度也有如此大的提升？

复制行为的代码：

import time
import torch 
import torch.nn.functional as F 

class Model(torch.nn.Module): 
  def __init__(self): 
    super(Model,self).__init__()

    self.fc1 = torch.nn.Linear(in_features=3,out_features=512)
    self.fc2 = torch.nn.Linear(in_features=512,out_features=512)
    self.fc3 = torch.nn.Linear(in_features=512,out_features=512)
    self.fc4 = torch.nn.Linear(in_features=512,out_features=512)
    self.fc5 = torch.nn.Linear(in_features=512,out_features=512)
    self.fc6 = torch.nn.Linear(in_features=512,out_features=512)
    self.fc7 = torch.nn.Linear(in_features=512,out_features=1)

  def forward(self,x): 
    in_dim = x.shape[0]
    x = F.relu(self.fc1(x.reshape(in_dim**2,3)))
    x = F.relu(self.fc2(x))
    x = F.relu(self.fc3(x))
    x = F.relu(self.fc4(x))
    x = F.relu(self.fc5(x))
    x = F.relu(self.fc6(x))
    x = F.relu(self.fc7(x)).reshape((in_dim,in_dim,1))
    return x 

device = 'cuda'
mlp = Model().to(device)

dim = 1024
input_tensor = torch.rand((dim,dim,3),device=device)    

with torch.no_grad(): 
  start_time = time.time()
  out = mlp(input_tensor)
  end_time = time.time() 
  print("Model FW pass {}p: {} seconds".format(input_tensor.shape[0],end_time-start_time))