多对多 LSTM PyTorch - 编程之家

如何解决多对多 LSTM PyTorch

我想在 PyTorch 中为 FashionMNIST 数据集构建一个 LSTM 模型。稍后我需要将其扩展到包含视频的不同数据集。

它应该获得一系列图像（FashionMNIST）作为输入（假设有 20 张图像），输出应该告诉我序列中有多少运动鞋（第 6 类）以及它们在序列中的位置。

我想知道这是否可以通过简单的 LSTM 或简单的 CNN 实现，或者我是否需要 CNN_LSTM？我尝试在 PyTorch 中实现一个 CNN_LSTM。您可以在下面找到我当前的模型（现在会引发错误）。最后一行抛出以下错误：“input must have 3维，得到4”（我还添加了错误信息的第一部分作为图片）。有人可以随时提供一些帮助吗？我这样做的方式正确吗？我无法修复错误，我不确定我的其余代码是否正确。我对 LSTM 很陌生。另外，我如何转换 FashionMNIST 数据集，使其始终包含 20 张图像的序列？

非常感谢！

class CNN(nn.Module):
  def __init__(self,K):
    super(CNN,self).__init__()
    self.layer1 = nn.Sequential(
        nn.Conv2d(in_channels=1,out_channels=32,kernel_size=3,padding=1),nn.Batchnorm2d(32),nn.ReLU(),nn.MaxPool2d(kernel_size=2,stride=2))


    self.layer2 = nn.Sequential(
        nn.Conv2d(in_channels=32,out_channels=64,kernel_size=3),nn.Batchnorm2d(64),nn.MaxPool2d(2))

    
    # three fully connected layer
    self.fc1 = nn.Linear(in_features=64*6*6,out_features=600)
    self.drop = nn.Dropout2d(0.25)
    self.fc2 = nn.Linear(in_features=600,out_features=120)
    self.fc3 = nn.Linear(in_features=120,out_features=10)
  
  def forward(self,x):
      out = self.layer1(x)
      out = self.layer2(out)
      out = out.view(out.size(0),-1)
      out = self.fc1(out)
      out = self.drop(out)
      out = self.fc2(out)
      out = self.fc3(out) 
      
      return out

class Combine(nn.Module):
  def __init__(self,K):
    super(Combine,self).__init__()
    self.cnn = CNN(K)
    self.D = 10  # n_inputs
    self.M = 128 # n_hidden
    self.K = 2 # n_outputs
    self.L = 10 # n_rnnlayers

    self.rnn = nn.LSTM(
        input_size=self.D,hidden_size=self.M,num_layers=self.L,batch_first=True)
    self.fc =nn.Linear(self.M,self.K)
  
  def forward(self,X):
    # initial hidden states
    h0 = torch.zeros(self.L,X.size(0),self.M).to(device)
    c0 = torch.zeros(self.L,self.M).to(device)

    # get RNN unit output
    out,_ = self.rnn(X,(h0,c0)) 
    out = self.fc(out)
    
    return out


model = Combine(K)

# use GPU in colab if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
model.to(device)

# Loss and optimizer
learning_rate = 0.001
criterion = nn.CrossEntropyLoss() # because mutli-class classification (includes softmax activation function for multi-class already)
optimizer = torch.optim.Adam(model.parameters(),lr=learning_rate)

# Training and testing the model
def batch_gd(model,criterion,optimizer,train_loader,test_loader,epochs):
  train_losses = np.zeros(epochs)
  test_losses = np.zeros(epochs)

  # iterate over epochs
  for it in range(epochs):
    model.train()
    t0 = datetime.Now()
    train_loss = []
    for inputs,targets in train_loader:
      # move data to GPU
      #inputs = inputs.reshape(-1,28,28)
      inputs,targets = inputs.to(device),targets.to(device)

      # zero the parameter gradients (empty gradients) for backward pass
      # Initializing a gradient as 0 so there is no mixing of gradient among the batches
      optimizer.zero_grad()

      # Forward pass
      outputs = model(inputs)
      loss = criterion(outputs,targets)
        
      # Backward and optimize
      loss.backward() # propagating the error backward 
      optimizer.step() # optimizing the parameters

      train_loss.append(loss.item())

    # Get train loss and test loss
    train_loss = np.mean(train_loss) # a little misleading
    
    # evaluate model
    model.eval()
    test_loss = []
    for inputs,targets in test_loader: # test samples and targets
      # move data to GPU
      inputs,targets.to(device)
      outputs = model(inputs)
      loss = criterion(outputs,targets)
      test_loss.append(loss.item())
    test_loss = np.mean(test_loss)

    # Save losses
    train_losses[it] = train_loss
    test_losses[it] = test_loss
    
    dt = datetime.Now() - t0
    print(f'Epoch {it+1}/{epochs},Train Loss: {train_loss:.4f},\
      Test Loss: {test_loss:.4f},Duration: {dt}')
  
  return train_losses,test_losses

train_losses,test_losses = batch_gd(
    model,epochs=15)

Part of the Error Message

解决方法

这是一个有用的思想实验 - 您已将问题定义为一个连续的决策过程，但是运动鞋的展示顺序是否重要？

假设这是您的序列，其中 x 是非运动鞋，S 是运动鞋，并且您将要在位置 7 对图像进行分类：

xxSxxx?

此序列中位置 3 的运动鞋这一事实是否会影响您对当前运动鞋的决定？

它不应该 - 这意味着您实际上不应该将此视为顺序问题，并且不应该使用旨在对顺序依赖性进行建模的 RNN。相反，您可以将其视为简单地训练一个单个模型来对每个输入进行预测，独立其他输入。您可以在运动鞋的“序列”上运行此模型，并记录哪些是运动鞋，但当然顺序无关紧要:)