微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

预期输入 batch_size (18) 匹配目标 batch_size (6)

如何解决预期输入 batch_size (18) 匹配目标 batch_size (6)

用于图像分类的 RNN 是否仅适用于灰度图像? 以下程序适用于灰度图像分类

如果使用 RGB 图像,则会出现此错误

预期输入batch_size (18) 匹配目标batch_size (6)

在这一行 loss = criterion(outputs,labels)

我的 train、valid 和 test 数据加载如下。

input_size  = 300
inputH = 300
inputW = 300

#Data transform (normalization & data augmentation)
stats = ((0.4914,0.4822,0.4465),(0.2023,0.1994,0.2010))
train_resize_tfms = tt.Compose([tt.Resize((inputH,inputW),interpolation=2),tt.ToTensor(),tt.normalize(*stats)])

train_tfms = tt.Compose([tt.Resize((inputH,tt.RandomHorizontalFlip(),tt.normalize(*stats)])
valid_tfms = tt.Compose([tt.Resize((inputH,tt.normalize(*stats)])
test_tfms = tt.Compose([tt.Resize((inputH,tt.normalize(*stats)])

#Create dataset
train_ds = ImageFolder('./data/train',train_tfms)
valid_ds = ImageFolder('./data/valid',valid_tfms)
test_ds = ImageFolder('./data/test',test_tfms)

from torch.utils.data.DataLoader import DataLoader
batch_size = 6

#Training data loader
train_dl = DataLoader(train_ds,batch_size,shuffle = True,num_workers = 8,pin_memory=True)
#Validation data loader
valid_dl = DataLoader(valid_ds,pin_memory=True)
#Test data loader
test_dl = DataLoader(test_ds,1,shuffle = False,num_workers = 1,pin_memory=True)

我的模型如下。

num_steps = 300
hidden_size = 256 #size of hidden layers
num_classes = 5
num_epochs = 20
learning_rate = 0.001
# Fully connected neural network with one hidden layer
num_layers = 2 # 2 RNN layers are stacked  
class RNN(nn.Module):
    def __init__(self,input_size,hidden_size,num_layers,num_classes):
        super(RNN,self).__init__()
        self.num_layers = num_layers
        self.hidden_size = hidden_size
        self.rnn = nn.RNN(input_size,batch_first=True,dropout=0.2)#batch must have first dimension
        #our inpyt needs to have shape
        #x -> (batch_size,seq,input_size)
        self.fc = nn.Linear(hidden_size,num_classes)#this fc is after RNN. So needs the last hidden size of RNN

    def forward(self,x):
        #according to ducumentation of RNN in pytorch
        #rnn needs input,h_0 for inputs at RNN (h_0 is initial hidden state)

        #the following one is initial hidden layer
        h0 = torch.zeros(self.num_layers,x.size(0),self.hidden_size).to(device)#first one is number of layers and second one is batch size
        #output has two outputs. The first tensor contains the output features of the hidden last layer for all time steps
        #the second one is hidden state f
        out,_ = self.rnn(x,h0)
        #output has batch_size,num_steps,hidden size
        #we need to decode hidden state only the last time step
        #out (N,30,128)
        #Since we need only the last time step
        #Out (N,128)
        out = out[:,-1,:] #-1 for last time step,take all for N and 128
        out = self.fc(out)
        return out


stacked_rnn_model = RNN(input_size,num_classes).to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()#cross entropy has softmax at output
#optimizer = torch.optim.Adam(stacked_rnn_model.parameters(),lr=learning_rate) #optimizer used gradient optimization using Adam 
optimizer = torch.optim.SGD(stacked_rnn_model.parameters(),lr=learning_rate)
# Train the model
n_total_steps = len(train_dl)
    for epoch in range(num_epochs):
        t_losses=[]
        for i,(images,labels) in enumerate(train_dl):  
            # origin shape: [6,3,300,300]
            # resized: [6,300]
            images = images.reshape(-1,input_size).to(device)
            print('images shape')
            print(images.shape)
            labels = labels.to(device)
            
            # Forward pass
            outputs = stacked_rnn_model(images)
            print('outputs shape')
            print(outputs.shape)
            loss = criterion(outputs,labels)
            t_losses.append(loss)
            # Backward and optimize
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

打印图像和输出形状

images shape
torch.Size([18,300])
outputs shape
torch.Size([18,5])

错在哪里?

解决方法

Tl;dr:您正在展平前两个轴,即 batchchannels


我不确定您是否采用了正确的方法,但我会写关于该层的内容。

无论如何,让我们看看您面临的问题。您有一个生成 (6,3,300,300) 的数据加载器,即批量生成 6 个三通道 300x300 图像。从外观上看,您希望将每个批次元素 (3,300) 重塑为 (step_size=300,-1)

然而,您使用 images.reshape(-1,num_steps,input_size) 影响了第一个轴 - 您不应该这样做。这将在处理单通道图像时产生预期效果,因为 dim=1 不会是“通道轴”。在您的情况下,您有 3 个通道,因此,结果形状为:(6*3*300*300//300//300,300),它是 (18,300),因为 num_steps=300input_size=300。结果,您只剩下 18 个批处理元素,而不是 6

相反,您想要的是用 (batch_size,-1) 重塑。留下可变大小的最后一个轴(又名seq_length)。这将导致形状 (6,900)


这是一个更正和简化的片段:

batch_size = 6
channels = 3
inputH,inputW = 300,300
train_ds = TensorDataset(torch.rand(100,inputH,inputW),torch.rand(100,5))
train_dl = DataLoader(train_ds,batch_size)

class RNN(nn.Module):
    def __init__(self,input_size,hidden_size,num_layers,num_classes):
        super(RNN,self).__init__()
        # (batch_size,seq,input_size)
        self.rnn = nn.RNN(input_size,batch_first=True)
        # (batch_size,hidden_size)
        self.fc = nn.Linear(hidden_size,num_classes)
        # (batch_size,num_classes)

    def forward(self,x):
        out,_ = self.rnn(x)
        out = out[:,-1,:]
        out = self.fc(out)
        return out

num_steps = 300
input_size = inputH*inputW*channels//num_steps
hidden_size = 256
num_classes = 5
num_layers = 2

rnn = RNN(input_size,num_classes)
for x,y in train_dl:
    print(x.shape,y.shape)
    images = images.reshape(batch_size,-1)
    print(images.shape)
    outputs = rnn(images)
    print(outputs.shape)
    break

正如我在开头所说的,我对这种方法有点谨慎,因为您实际上是以 300 个扁平矢量序列的形式向 RNN 提供 RGB 300x300 图像。 .. 我不能说这是否有意义和训练条款以及模型是否能够从中学习。我可能是错的!

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。